Internet DRAFT - draft-charny-ef-definition
draft-charny-ef-definition
Network Working Group A. Charny
Internet Draft Cisco
Document: draft-charny-ef-definition-01.txt Nov 2000
EF PHB Redefined
Internet Draft Anna Charny, ed.
Cisco Systems
Fred Baker
Cisco Systems
Jon Bennett
Riverdelta Networks
Kent Benson
Tellabs
Jean-Yves Le Boudec
EPFL
Angela Chiu
AT&T Labs
William Courtney
TRW
Bruce Davie
Cisco Systems
Shahram Davari
PMC-Sierra
Victor Firoiu
Nortel Networks
Charles Kalmanek
AT&T Research
K.K. Ramakrishnan
AT&T Research
Dimitrios Stiliadis
Lucent Technologies
Expires May 2001
draft-charny-ef-definition-01.txt November 2000
EF PHB Redefined
Charny May 2000 1
EF PHB Redefined Nov 2000
Status of this Memo
This document is an Internet Draft and is in full conformance with
all provisions of Section 10 of RFC2026. Internet Drafts are working
documents of the Internet Engineering Task Force (IETF), its Areas,
and its Working Groups. Note that other groups may also distribute
working documents as Internet Drafts.
Internet Drafts are draft documents valid for a maximum of six
months. Internet Drafts may be updated, replaced, or obsoleted by
other documents at any time. It is not appropriate to use Internet
Drafts as reference material or to cite them other than as a
"working draft" or "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
To learn the current status of any Internet-Draft, please check the
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
Directories on ftp.ietf.org (US East Coast), nic.nordu.net
(Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
Rim).
This document is a product of the Diffserv working group of the
Internet Engineering Task Force. Please address comments to the
group's mailing list at diffserv@ietf.org, with a copy to the
authors.
Copyright (C) The Internet Society (1999). All Rights
Reserved.
Abstract
This document proposes text aiming at providing clarification to RFC
2598. The primary motivation for this draft is to clarify the
definition
of EF PHB given in RFC 2598.
This draft gives a rigorous definition of EF PHB which in our
opinion preserves the spirit of the EF PHB as intended by RFC 2598
while allowing a number of reasonable compliant implementations.
1 Introduction
The Expedited Forwarding (EF) Per-Hop Behavior (PHB) of RFC 2598 was
designed to be used to build a low-loss, low-latency, low-jitter,
assured bandwidth service. The potential benefits of this service,
and therefore the EF PHB, are enormous. Because of the great value
of this PHB, it is critical that the forwarding behavior required of
Charny May 2000 2
EF PHB Redefined Nov 2000
and delivered by an EF-compliant node be specific, quantifiable, and
unambiguous.
The underlying intuition behind the EF PHB, as defined in RFC 2598,
stems from the fact that delay and jitter are typically small in a
lightly loaded network. The EF PHB, as defined in RFC 2598,
effectively defines a building block for creating a "virtual
unloaded network" for EF traffic. It achieves this goal by
requiring that the service rate of the EF aggregate at any link be
equal to or exceeds the input rate of EF traffic at any link (under
the assumption that the network is appropriately provisioned and
that EF traffic is shaped/policed at the network ingress).
Conceptually, the configured rate of the EF aggregate can be viewed
as the "link speed" in this "virtual network". While specifying
this "link speed" is not by itself sufficient to provide strict
delay or jitter guarantees in a general network, nevertheless
knowing this "link speed", or the minimal guaranteed drain rate of
EF traffic, is essential for the ability to construct quantifiable
end-to-end behavior across a Diffserv domain.
Thus, the definition of EF PHB in RFC 2598 is indeed a necessary
building block for constructing quantifiable PDBs. Unfortunately
however, we believe that the actual definition contained in section
2 of RFC 2598 is not sufficiently precise. As a result, many of the
forwarding behaviors which are intuitively reasonable do not
actually comply with the formal definition of RFC 2598. Furthermore,
many of the schedulers believed to deliver EF-compliant behavior
cannot be used to implement the formal definition of EF since they
result in forwarding treatment which does not comply with the
definition of RFC 2598.
A detailed discussion of the issues we find with the definition of
RFC 2598 is given in Appendix A.
The goal of this draft is to give a precise mathematical definition
that describes the notion of ensuring a guaranteed service rate for
an EF aggregate at a small timescale, thus presenting a formal
framework for constructing an "unloaded virtual network" for EF
traffic. Different PDBs may be constructed from this basic building
block by imposing various restrictions on the network topology,
configuration parameters, scheduling disciplines, etc. These
mechanisms are outside the scope of this draft.
2 Definition of EF PHB
2.1 The Formal Definition
2.1.1 Intuition behind the definition
The intent of EF PHB is to provide the EF aggregate with its
configured service rate (or better) over as small a timescale as
Charny May 2000 3
EF PHB Redefined Nov 2000
possible. We formalize this notion by introducing what we call a
"packet scale rate guarantee".
The intuitive meaning of the packet scale rate guarantee is that as
long as there are EF packets in the node, we would like the j-th EF
packet of length L(j) to depart no later than L(j)/R seconds after
the (j-1)st departed (here R is the configured rate of the
aggregate). (L(j)/R is simply the time that it would take to forward
the j-th packet at the EF-configured rate R.) Were this always to
occur, the EF packets would be forwarded perfectly at the configured
rate.
However, real world schedulers and router architectures introduce
various degrees of distortion in the perfect forwarding sequence.
Furthermore, it is clear that packets may not possibly be forwarded
at the configured rate if they arrive slower than at this rate. The
formal definition must account for these issues.
In essence, the packet scale rate guarantee is defined in terms of
an upper bound on the deviation of the actual departure time of the
j-th packet of EF aggregate from the "ideal" departure time at
configured rate R. The "ideal" departure time is computed
iteratively. Essentially, when there are multiple EF packets in the
device, the ideal time of the j-th departure is simply the ideal
time of previous departure plus L(j)/R, where L(j) is the length of
the j-th packet to depart. In the case when an EF packet arrives to
a device when all the previous packets have already departed, the
computation of the ideal departure time is somewhat more
complicated. There are two cases to be considered in this case. If
the previous, j-1-th departure occurred after its own ideal
departure time, then the new ideal departure time should be L(j)/R
plus the larger of the j-1-th ideal departure time and the j-th
arrival time. This is the case when the EF aggregate is behind its
ideal service rate at the time of the j-1-th departure. However, if
the previous departure occurred before its ideal departure time,
which corresponds to the case when the EF aggregate has been served
faster than at its configured rate at by the time of the j-1-th
departure, then the new departure time is computed as L(j)/R plus
the larger of the j-th arrival time and the time of the actual
(rather than the ideal)j-1-th departure. This is needed to avoid
"punishing" the newly arrived EF packet by delaying it longer due to
some other packets receiving service faster than at the configured
rate R in the past. More discussion of this issue can be found in
appendices A and E.
2.1.2 The Formal Definition
Formally, we say that a node provides EF service if it forwards
packets in compliance with the following definition:
Definition of Packet Scale Rate Guarantee (DEF_1)
-----------------------------------------
Charny May 2000 4
EF PHB Redefined Nov 2000
A node offers the EF aggregate a "packet scale rate guarantee R with
latency E" at some output interface I if for all j > 0, d(j), the
time of departure of the j-th EF packet to depart from the interface
I, satisfies the following condition:
d(j) <= F(j) + E (eq_1)
where F(j) is defined iteratively by
F(0)=0, d(0) = 0
F(j)=max(a(j), min(d(j-1), F(j-1)))+ L(j)/R, for all j>0 (eq_2)
and E is a constant tolerance (or error) term for the node (given in
seconds).
In this definition,
d(j) is the time that the last bit of the j-th EF packet to depart
actually leaves the node from the interface I.
F(j) is the target departure (finishing) time for the j-th EF packet
to depart from I, the "ideal" time that the last bit of that packet
should leave the node.
a(j) is the time that the last bit of the j-th EF packet destined to
the output I to arrive actually arrives at the node.
L(j) is the size (bits) of the j-th EF packet to depart from I.
R is the EF configured rate at I (in bits/second)
Note that the sequences a(j), d(j) and F(j) relate to packets that
leave a given output interface, in this case interface I, but may
arrive from any input interface. Every OUTPUT interface, I,J,K,etc
has its own sequence of a(j)'s, d(j)'s and F(j)'s, i.e. a_I(j),
a_J(j), a_K(j), etc, for clarity we omit the subscript since it can
be inferred.
The choice of indexes does not restrict when in the actual packet
stream we start the observation of the arrival and departure of EF
packets, except that the observation must start when there are no EF
packets in the node for this output interface. (Otherwise, we would
not have the values of a(j) for the EF packets already in the node.)
Note also that while index j=1 corresponds to the first packet in
the observation, index j=0 does not correspond to any packet at all
and is used solely to start the recursion.
The latency term E in (eq_1) quantifies the maximum distortion from
the ideal service at the configured rate R that a particular device
Charny May 2000 5
EF PHB Redefined Nov 2000
can introduce. As a result, the term E in (eq_1) can be viewed as a
"figure of merit" and can be used to compare different
implementations of EF PHB.
NOTE: The latency term E may be declared on a per output link basis.
NOTE: Since the declaration of a fixed value of E may for some
schedulers restrict the range of the configured rate R, the value of
E may be declared as a function of the configured rate R.
Note that nothing in the definition implies that a(j) and d(j)
necessarily refer to the same packet. This lack of direct
correspondence between a(j) and d(j) is deliberate, and relates to
the goal of accommodating a wide range of schedulers and router
architectures. Even in the case of a priority FIFO implementation at
the output interface, the presence of variable internal delay may
result in reordering of the EF packets arriving from different input
interfaces, causing the j-th EF packet arriving to a router not
being the same packet as the j-th EF packet departing from the
router. Likewise, the j-th arriving packet may not necessarily be
the j-th departing packet in "flow-aware schedulers" which have the
ability to differentiate between different sub-flows within the EF
aggregate. An example of such a scheduler might be a hierarchical
scheduler which serves the EF aggregate as a whole at the highest
priority, but uses some WFQ implementation to choose a packet of a
particular sub-stream of EF (e.g. a given "virtual wire" circuit)
within the EF aggregate. Further discussion of interpretation of
this definition can be found in Appendix A.
2.1.3. Example usage of the definition.
We now show an example of how the definition can be applied to an
abstract router.
The figures below describe a sequence of packets arriving to a
router, and their departure times. Nothing is known about the
internals of the router, and the arrival and departure times
represent the only externally observable information. All packets
shown in the examples are destined to a single output interface. For
the sake of an example, we assume that the output interface in
question has a configured rate R=C/2, where C is the output line
rate, and that the router declares the error term E=4*(MTU/C) at
this interface.
In each figure, time increases as we move to the right. Units of
time are MTU/C, the time it takes to forward an MTU-sized packet at
the output line rate C. For simplicity, all packets are MTU-sized.
The first figure below shows a sequence of arriving EF packets
(labeled A, B, etc., using upper-case letters). The placement of the
letter corresponds to the time when the last bit of the packet
arrives at the router. Note that there is some degree of burstiness
Charny May 2000 6
EF PHB Redefined Nov 2000
in the input pattern: packets A and B arrive back-to-back, and
packets D and E arrive back-to-back as well. Packets E and F arrive
simultaneously on different input interfaces.
t --->
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
A
B
C
D
E
F
G
The next figure below shows a forwarding behavior that conforms to
the definition proposed in this draft. On each line, the packet
letter (A, B, etc.) shows the time that the last bit of the packet
is forwarded. I.e., the upper-case letters give the values of d()
for the sequence of packets. The terms 'f' and 'f+e' in a row give
the ideal departure time, F(), and the latest permissible departure
time, F() + E, respectively, for the packet on that row. Thus, F(A)
= 2 and F(A) + E = 6, as given on the first row of the body of the
table. Similarly, F(B) = 3 and F(B) + E = 7, as given on the second
row of the body of the table. Hence, any uppercase letter which is
placed to the left of the time corresponding to f+e on the line
corresponds to a conformant departure. Calculations using equations
(eq_1) and (eq_2) are given after the figure to show how the values
of 'f' and 'f+e' were obtained. Some comments follow the
calculations.
t --->
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
A f f+e
Bf f+e
f C f+e
f D f+e
f E f+e
f F f+e
f G f+e
a(0) = d(0) = 0.
a(A) = 0
F(A) = max(0, min(0, 0)) + 2 = 2, d(A) must be <= 2 + 4 = 6
d(A) = 1 <= 6
a(B) = 1
F(B) = max(1, min(1, 2)) + 2 = 3, d(B) must be <= 3 + 4 = 7
d(B) = 3 <= 7
a(C) = 3
Charny May 2000 7
EF PHB Redefined Nov 2000
F(C) = max(3, min(3, 3)) + 2 = 5, d(C) must be <= 5 + 4 = 9
d(C) = 6 <= 9
a(D) = 5
F(D) = max(5, min(6, 5)) + 2 = 7, d(D) must be <= 7 + 4 = 11
d(D) = 10 <= 11
a(E) = 6
F(E) = max(6, min(10, 7)) + 2 = 9, d(E) must be <= 9 + 4 = 13
d(E) = 11 <= 13
a(F) = 6
F(F) = max(6, min(11, 9)) + 2 = 11, d(F) must be <= 11 + 4 = 15
d(F) = 12
a(G) = 9
F(G) = max(9, min(12, 11)) + 2 = 13, d(G) must be <= 13 + 4 = 17
d(G) = 14 <= 17
The key to understanding the calculations is to notice that whenever
a packet P is forwarded earlier than its ideal departure time,F(P),
the calculation of the next packet's ideal departure time uses P's
actual departure time. Whenever a packet P is forwarded later than
its ideal departure time, the calculation of the next packet's ideal
departure time uses P's ideal departure time. Thus, slippage is not
allowed to accumulate when packets are forwarded late, and credit is
not built up when packets are forwarded early.
The next figure below shows another forwarding behavior for the same
arrival pattern. This behavior does not conform to this draft's
proposed definition.
t --->
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
A f f+e
Bf f+e
C f f+e
Df f+e
f E f+e
f f+e F
f Gf+e
Here, A and C are forwarded early, while B and D are forwarded at
their ideal departure times. Note, however, that these ideal
departure times are earlier than they would have been if A and C had
not been forwarded early. (Note also, that there is no accumulation
of credit for the early departures of A and C. If such an
accumulation of credit were permitted, jitter could be increased if
a subsequent packet is delayed a long time while the credit is
spent.) Thus, the ideal forwarding time for the fourth packet, D, is
at time 6, even though the EF-configured rate is one packet every
two time intervals. Packet E is forwarded late, although still
within tolerance. Packet F's ideal forwarding time is 10, but it is
Charny May 2000 8
EF PHB Redefined Nov 2000
not forwarded until time 15, one time unit later than its latest
permissible forwarding time. This very late departure makes the
behavior non-conformant. Packet G is forwarded in conformance with
the definition, but just barely.
These examples illustrate how conformance to the proposed definition
can be verified without any knowledge of the internal router
architecture or scheduling implementation. Of course, while this
knowledge is not necessary to determine the conformance with a given
declared E, the designer of the box must use this knowledge to be
able to declare E for the device.
2.2 Per Packet Delay.
It is important to note that just as with the definition of EF PHB
in RFC 2598, the packet scale rate guarantee is defined only in the
context of an entire EF aggregate as a whole. In particular, the
packet-scale rate guarantee definition is intentionally silent about
exactly how various sub-streams of the EF aggregate are scheduled
within the EF aggregate. A consequence of this is that packet scale
rate guarantee provided to EF aggregate does not by itself imply a
per-packet delay.
This is analogous to the fact that the mere knowledge of link rates
in a real network serving just a single class of traffic does not in
itself provide per-packet delay guarantee. The per packet delay
guarantee at a hop with a FIFO service in such network will differ
drastically from a per-packet delay guarantee at a hop with a WFQ
server.
Aside from the knowledge of the properties of scheduling
implementations, ensuring per-packet delay at a hop involves the
ability to bound burstiness at the ingress of a hop. This is a
complex task involving a fair amount of global knowledge such as the
network topology, hop count, link utilization, upstream scheduling
implementations, etc. As a result, addressing these issues appears
appropriate not in the context of a local PHB definition, but rather
in the context of a PDB, which is inherently a global concept.
3. Implementation considerations.
The packet scale rate guarantee definition does not mandate a
particular underlying queuing or scheduling mechanism. However, for
the definition to be meaningful, it is important to make sure that
there exist at least some schedulers that strictly satisfy the
definition with reasonable latency terms.
It can be shown that the strict priority scheduler in which all EF
packets share a single FIFO queue (which is served at strict non-
preemptive priority over other queues) satisfies the definition with
the latency term E = MTU/C where MTU is the maximum packet size and
C is the speed of output link.
Charny May 2000 9
EF PHB Redefined Nov 2000
Another scheduler that satisfies the definition with a small latency
term is WF2Q described in [BZ96a]. A class-based WF2Q scheduler, in
which all EF traffic shares a single queue with the weight
corresponding to the configured rate of the EF aggregate can be
shown to satisfy the definition with the latency term E =
MTU/C+MTU/R.
The proofs that PQ and WF2Q satisfy the packet-scale rate guarantee
definition with the above latency terms are given in Appendix C.
The definition also allows a wide range of scheduling algorithms,
but different algorithms result in different degrees of deviation
from the "ideal" service rate. The degree of the accuracy with which
a scheduler can ensure that the EF aggregate receives its configured
rate at a small (packet) timescale is expressed by the E term of the
scheduler. A list of several well-known schedulers and their
corresponding error terms can be found in Appendix D.
5. Security Considerations
This draft makes the PHB definition in [RFC2598] more rigorous, but
adds no new functions to it. As a result, it adds no security issues
to those described in that specification.
6. Appendices
Appendix A: Issues with the RFC 2598 PHB Definition
There are several potentially serious problems with having a formal
EF definition that does not match people's intuitive understanding.
First, the understanding of what it means for a node to be EF-
compliant may vary among people. This discrepancy may arise due to
the fact that two people's intuitive understanding of the definition
may actually differ somewhat; also, someone learning about EF from
the formal definition may develop an understanding of EF at odds
with the understanding that most people currently familiar with EF
have. These discrepancies in people's understanding of EF may have
serious consequences. The resulting confusion may increase the time
and cost needed to develop equipment, cause interoperability
problems, and create mismatches between expected node and network
performance and actual performance. Second, the lack of a clear
conformance definition makes it impossible to test a piece of
equipment and declare it "conforming" or "non-conforming." Third,
the lack of a mathematically precise description of a node's
behavior makes it impossible to analytically design or evaluate
services constructed using the EF PHB or other PHBs that must
contend for resources with EF traffic. Fourth, an incorrect formal
definition of EF may lead to erroneous reasoning about the
properties of networks implementing EF.
A.1 The RFC 2598 Definition of EF PHB and Its Intuitive Meaning
The definition of the EF PHB as given in [RFC 2598] states:
Charny May 2000 10
EF PHB Redefined Nov 2000
It [the EF PHB departure rate] SHOULD average at least the
configured rate when measured over any time interval equal to or
longer than the time it takes to send an output link MTU sized
packet at the configured rate.
The intuitive content of this definition is fairly clear. On all
time scales ranging down to very small time scales, the EF aggregate
should be given at least its configured share of the output link
bandwidth. Among other things, this allows EF to support
applications that are delay- and jitter-sensitive.
However, intuition alone will not allow vendors to design compliant
schedulers capable of advertising their EF configuration to other
routers. As we show in the next section, the simplicity of the
definition is misleading in the sense that it does not actually
capture the intuition correctly under a number of circumstances.
A note is due here on the precise interpretation of the wording of
the definition. A potential cause of ambiguity is the fact that the
definition contains the word SHOULD which according to [Bra97] means
that in principle an implementation of EF PHB may under some
circumstances choose not to be strictly compliant with the specified
requirement, in which case any issues with the strict definition may
be viewed as irrelevant. However, it seems that in order for the
SHOULD to be meaningful, there should exist at least some
implementations which are strictly compliant, even if non-compliant
implementations may be chosen under some circumstances. Furthermore,
the Virtual Wire behavior aggregate [JNP2000] is defined by
replacing SHOULD by MUST in the definition of EF PHB in RFC 2598.
Therefore, in all cases the exact mathematical properties of the EF
definition and the existence of strictly compliant implementations
are of substantial interest. The remainder of this section
concentrates on the discussion of these issues in detail.
A.2 Particular Difficulties with the RFC 2598 EF PHB Definition
A literal interpretation of the definition would consider the
behaviors given in the next two subsections as non-compliant. The
definition also unnecessarily constrains the maximum configurable
rate of an EF aggregate.
A.2.1 Perfectly-Clocked Forwarding
Consider the following stream forwarded from a router with EF-
configured rate R=C/2, where C is the output line rate. In the
illustration, E is an MTU-sized EF packet while x is a non-EF packet
or unused capacity, also of size MTU.
... E x E x E x E x E x E x...
|-----|
Charny May 2000 11
EF PHB Redefined Nov 2000
The interval between the vertical bars is 3*MTU/C, which is greater
than MTU/(C/2), and so is subject to the EF PHB definition. During
this interval, 3*MTU/2 bits of the EF aggregate should be forwarded,
but only MTU bits are forwarded. Therefore, while this forwarding
pattern should be considered compliant under any reasonable
interpretation of the EF PHB, it actually does not formally comply
with the definition of RFC 2598.
Note that this forwarding pattern can occur in any work-conserving
scheduler in an ideal output-buffered architecture where EF packets
arrive in a perfectly clocked manner according to the above pattern
and are forwarded according to exactly the same pattern in the
absence of any non-EF traffic.
Trivial as this example may be, it reveals the lack of mathematical
precision in the formal definition. The fact that no work-conserving
scheduler can formally comply with the definition is unfortunate,
and appears to warrant some changes to the definition that would
correct this problem.
The underlying reason for the problem described here is quite simple
- one can only expect that the EF aggregate is served at configured
rate in some interval where there is enough backlog of EF packets to
sustain that rate. In the example above the packets come in exactly
at the rate at which they are served, and so there is no persistent
backlog. Certainly, if the input rate is even smaller than the
configured rate of the EF aggregate, there will be no backlog as
well, and a similar formal difficulty will occur.
A seemingly simple solution to this difficulty might be to require
that the EF aggregate is served at its configured rate only when the
queue is backlogged. However, as we show in the remainder of this
section, this solution does not suffice.
A.2.2 Router Internal Delay
We now argue that the example considered in the previous section is
not as trivial as it may seem at first glance.
Consider a router with EF configured rate R = C/2 as in the previous
example, but with an internal delay of 3T (where T = MTU/C) between
the time that a packet arrives at the router and the time that it is
first eligible for forwarding at the output link. Such things as
header processing, route look-up, and delay in switching through a
multi-layer fabric could cause this delay. Now suppose that EF
traffic arrives regularly at a rate of (2/3)R = C/3. The router will
perform as shown below.
EF Packet Number 1 2 3 4 5 6 ...
Arrival (at router) 0 3T 6T 9T 12T 15T ...
Arrival (at scheduler) 3T 6T 9T 12T 15T 18T ...
Departure 4T 7T 10T 13T 16T 19T ...
Charny May 2000 12
EF PHB Redefined Nov 2000
Again, the output does not satisfy the RFC 2598 definition of EF
PHB. As in the previous example, the underlying reason for this
problem is that the scheduler cannot forward EF traffic faster than
it arrives. However, it can be easily seen that the existence of
internal delay causes one packet to be inside the router at all
times. An external observer will rightfully conclude that the number
of EF packets that arrived to the router is always at least one
greater than the number of EF packets that left the router, and
therefore the EF aggregate is constantly backlogged. However, while
the EF aggregate is continuously backlogged, the observed output
rate is nevertheless strictly less that the configured rate.
This example indicates that the simple addition of the condition
that EF aggregate must receive its configured rate only when the EF
aggregate is backlogged does not suffice in this case.
Yet, the problem described here is of fundamental importance in
practice. Most routers have a certain amount of internal delay. A
vendor declaring EF compliance is not expected to simultaneously
declare the details of the internals of the router. Therefore, the
existence of internal delay may cause a perfectly reasonable EF
implementation to display seemingly non-conformant behavior, which
is clearly undesirable.
A.2.3 Maximum Configurable Rate and Provisioning Efficiency
It is well understood that with any non-preemptive scheduler, the
compliant configurable rate for an EF aggregate cannot exceed C/2
[JNP2000]. This is because an MTU-sized EF packet may arrive to an
empty queue at time t just as an MTU-sized non-EF packet begins
service. The maximum number of EF bits that could be forwarded
during the interval [t, t + 2*MTU/C] is MTU. But if configured rate
R > C/2, then this interval would be of length greater than MTU/R,
and more than MTU EF bits would have to be served during this
interval for the router to be compliant. Thus, R must be no greater
than C/2.
It can be shown that for schedulers other than PQ, such as various
implementations of WFQ, the maximum compliant configured rate may be
much smaller than 50%. For example, for SCFQ [Gol94] the maximum
configured rate cannot exceed C/N, where N is the number of queues
in the scheduler. For WRR, mentioned as compliant in section 2.2 of
RFC 2598, this limitation is even more severe. This is because in
these schedulers a packet arriving to an empty EF queue may be
forced to wait until one packet from each other queue (in the case
of SCFQ) or until several packets from each other queue (in the case
of WRR) are served before it will finally be forwarded.
While it is frequently assumed that the configured rate of EF
traffic will be substantially smaller than the link bandwidth, the
bandwidth appears unnecessarily limiting. For example, in a fully
connected mesh network, where any flow traverses a single link on
its way from source to its destination there seems no compelling
Charny May 2000 13
EF PHB Redefined Nov 2000
reason to limit the amount of EF traffic to 50% (or an even smaller
percentage for some schedulers) of the link bandwidth.
Another, perhaps even more striking example is the fact that even a
TDM circuit with dedicated slots cannot be configured to forward EF
packets at more than 50% of the link speed without violating RFC
2598 (unless the entire link is configured for EF). If the
configured rate of EF traffic is greater than 50% (but less than the
link speed), there will always exist an interval longer than MTU/R
in which less than the configured rate is achieved. For example,
suppose the configured rate of the EF aggregate is 2C/3. Then the
forwarding pattern of the TDM circuit might be
E E x E E x E E x ...
|---|
where only one packet is served in the marked interval of length 2T
= 2MTU/C. But at least 4/3 MTU would have to be served during this
interval by a router in compliance with the definition in RFC 2598.
The fact that even a TDM line cannot be booked over 50% by EF
traffic indicates that the restriction is artificial and
unnecessary.
A.3 The Non-trivial Nature of the Difficulties
One possibility to correct the problems discussed in the previous
sections might be to attempt to clarify the definition of the
intervals to which the definition applied or by averaging over
multiple intervals. However, an attempt to do so meets with
considerable analytical and implementation difficulties. For
example, attempting to align interval start times with some epochs
of the forwarded stream appears to require a certain degree of
global clock synchronization and is fraught with the risk of
misinterpretation and mistake in practice.
Another approach might be to allow averaging of the rates over some
larger time scale. However, it is unclear exactly what finite time
scale would suffice in all reasonable cases. Furthermore, this
approach would compromise the notion of very short-term time scale
guarantees that are the essence of EF PHB.
We also explored a combination of two simple fixes. The first is the
addition of the condition that the only intervals subject to the
definition are those that fall inside a period during which the EF
aggregate is continuously backlogged in the router (i.e., when an EF
packet is in the router). The second is the addition of an error
(latency) term that could serve as a figure-of-merit in the
advertising of EF services.
With the addition of these two changes the candidate definition
becomes as follows:
Charny May 2000 14
EF PHB Redefined Nov 2000
In any interval of time (t1, t2) in which EF traffic is
continuously backlogged, at least R(t2 - t1 - E) bits of EF traffic
must be served, where R is the configured rate for the EF aggregate
and E is an implementation-specific latency term.
The "continuously backlogged" condition eliminates the insufficient-
packets-to-forward difficulty while the addition of the latency term
of size MTU/C resolves the perfectly-clocked forwarding example
(section A.2.1), and also removes the limitation on EF configured
rate.
However, neither fix (nor the two of them together) resolves the
example of section A.2.2. To see this, recall that in the example of
section A.2.2 the EF aggregate is continuously backlogged, but the
service rate of the EF aggregate is consistently smaller than the
configured rate, and therefore no finite latency term will suffice
to bring the example into conformance. This appears to be a serious
problem.
Therefore, we believe that such modification, albeit attractive in
its simplicity, falls short of addressing all the problems
identified with the definition of the RFC 2598.
Appendix B: Further Interpretation of the Packet Scale Rate Guarantee
Definition
The intuitive meaning of the packet scale rate guarantee is that as
long as there are EF packets in the node, we would like the j-th EF
packet to depart L(j)/R seconds after the (j-1)st departed. (L(j)/R
is the time that it would take to forward the j-th packet at the EF-
configured rate R.) Were this always to occur, the EF packets would
be forwarded perfectly. The rest of the definition is a concession
to the extreme unlikelihood that perfect forwarding can occur.
Perhaps the simplest way to understand the definition is to dissect
it and examine its various pieces.
Consider the term min(d(j-1), F(j-1)). This term exists to ensure
that the node is not given "credit" for faster-than-configured
service and is not forgiven for slower-than-configured service.
Suppose that this term was replaced with d(j-1) or with F(j-1).
Replacing min(d(j-1), F(j-1)) with d(j-1) would permit the node to
give the EF aggregate a consistently lower rate of service than the
configured rate whenever E > 0. To see this, suppose that we make
the replacement, that all packets have size MTU, and that a(j) <=
d(j-1). (This last condition means that the node is continuously
backlogged with EF packets over the time interval under discussion.)
Then, using the revised definition, we would have
F(j) = d(j-1) + MTU/R
d(j) <= F(j) + E = d(j-1) + MTU/R + E
Charny May 2000 15
EF PHB Redefined Nov 2000
which would imply
[d(j) - d(j-1)] <= MTU/R + E
This last inequality says that the node would be permitted to send
an MTU-sized packet every (MTU/R)+E seconds. If E > 0, this rate
would be consistently slower than R and is clearly not acceptable EF
PHB.
Replacing min(d(j-1), F(j-1)) with F(j-1) would award the node
"credit" for faster-than-configured service. It would be possible
for the node to accumulate this credit by forwarding several EF
packets in a row, each earlier than required. The node could then
redeem this credit by delaying the next EF packet until all the
credit plus the normal inter-packet interval was consumed. To see
this, suppose we make the replacement, that all packets have size
MTU, and that a(j) <= F(j-1). (This last condition means that the
next EF packet arrives before the previous packet was scheduled to
depart.) Then, using this revised definition, we would have
F'(j) = F'(j-1) + MTU/R and d(j) <= F'(j) + E
Suppose that we have a node with negligible internal delay, that its
output line rate is C = 3R, and that it forwards n EF packets back-
to-back. We would have
F'(1) = MTU/R; d(1) = MTU/C
F'(2) = F'(1) + MTU/R = 2MTU/R; d(2) = 2MTU/C
...
F'(n) = F'(n-1) + MTU/R = nMTU/R; d(n) = nMTU/C
By the time the n-th EF packet is forwarded, the node has
accumulated credit amounting to n(MTU/R - MTU/C). Using the example
assumption that C = 3R, the node has accumulated credit equal to
(2n/3)MTU/R. The (n+1)th EF packet need not be forwarded until
(2n/3)MTU/R + E seconds have elapsed from the time that the n-th
packet was forwarded. Depending upon the actual values of n and R
(which may be much less than 1/3 the output line rate), a sizeable
amount of jitter between the n-th and (n-1)th EF packets would be
produced.
These two alternative definitions illustrate the role of the
min(d(j-1), F(j-1)) term - to ensure that the node forwards EF
packets at at least the configured rate over both large and small
time scales.
The a(j) term and the maximum operator are included for purely
technical reasons. First, their presence says that the node does not
have to forward an EF packet that has not yet arrived. Absurd as
such a notion may be, without the term and the operator, the
definition would formally insist that EF packets continue to be
forwarded even when there are none to be forwarded.
Charny May 2000 16
EF PHB Redefined Nov 2000
If this were the only purpose for including the a(j) term and the
maximum operator, it would be much clearer to simply add the
condition that the definition applies only when the node has
backlogged EF packets. However, there is a second reason why the
definition is written as it is - the possibility that the node has
non-negligible internal delay between the input and the output. Such
things as header processing, route look-up, and delay in switching
through a multi-layer fabric could cause this delay.
The set-up of an example to illustrate this role of the a(j) term
and the maximum operator is a bit more lengthy than it was for the
previous examples. Consider a node with an EF-configured rate of R =
C/2. Let T = MTU/C, the time it takes to forward an MTU-sized packet
at the output line rate. Suppose that MTU-sized EF packets arrive
at the node regularly at a rate of (2/3)R = C/3. Suppose also that
the node has an internal delay of 3T. Even if there is no other
traffic, the node will perform no better than is shown below.
EF Packet Number 1 2 3 4 5 6 ...
Arrival at router (a(j)) 0 3T 6T 9T 12T 15T ...
Arrival (at scheduler) 3T 6T 9T 12T 15T 18T ...
Departure (d(j)) 4T 7T 10T 13T 16T 19T ...
Note that from time 0 onward, EF packets are backlogged in the node.
If the a(j) term and the maximum operator are removed from the
definition, then we would have
F'(j) = min(d(j-1), F'(j-1)) + MTU/R
d(j) <= F'(j) + E
Working through the recursions with F' representing the modified
target finishing time function and F representing the original
definition given in equations (1) and (2), we have
EF Packet Number 1 2 3 4 5 6 ...
Arrival at router (a(j)) 0 3T 6T 9T 12T 15T ...
Departure (d(j)) 4T 7T 10T 13T 16T 19T ...
Modified (F'(j)) 2T 4T 6T 8T 10T 12T ...
Original (F(j)) 2T 5T 8T 11T 14T 17T ...
The modified F' falls behind the departure times at a constant rate.
No fixed tolerance term, E, would be large enough to ensure that the
node's behavior was compliant. On the other hand, the original F has
every packet being forwarded late, but always late by the same
amount, 2T. Setting E >= 2T allows the node to conform to EF PHB.
Note that this node cannot possibly perform any better than has been
depicted in this example. It cannot begin forwarding EF packets
until 3T after they arrive. As in this example, as link speeds
increase, we may well discover that internal delays become multiples
of the time it takes to transmit a packet. Thus, it is important
that the definition of EF rigorously address acceptable behavior in
the presence of internal delay.
Charny May 2000 17
EF PHB Redefined Nov 2000
This last example leads to a consideration of the role of the
tolerance term, E. It happens that E must be greater than 0 for
almost every real-world node that would provide EF PHB.
We have already seen that we need E > 0 for a node that has internal
delay, even if there is no non-EF traffic. Another easy example
where E > 0 is required, is a non-preemptive node offering an EF-
configured rate R > C/2. Suppose, for example, that R = 0.75C. With
such a node, it is always possible that an EF packet will arrive at
a node (at time 0) just as that node is beginning to serve a non-EF
packet. Assuming that the EF packet and the non-EF packet are the
same size (say, MTU-sized), the EF packet will have to wait at least
until the non-EF packet is forwarded before it can begin to be
served. That is, the earliest that the EF packet can be forwarded is
2MTU/C. Yet, for the EF packet, F(1) = 0 + MTU/(0.75C) =
(4/3)MTU/C. If E were 0, then d(1) <= (4/3)MTU/C. But, this is
impossible. Thus, without the tolerance term E > 0, the node could
not be configured for this EF-configured rate, even if it serves EF
using priority queuing with EF as the highest priority.
In Appendix D, we consider the situations of other scheduling
disciplines for EF service, such as weighted round-robin, weighted
fair queuing, and other commonly-used schedulers. All of these
schedulers require an E > 0, even if their internal delay is
negligible. Rather than excluding nodes employing these schedulers
from ever being able to offer EF service, we included the tolerance
term E in the definition of the packet scale rate guarantee. It is
possible that nodes can use this term as a figure of merit when
advertising their capability to provide EF PHB.
It is also important to note that the tolerance E does not permit a
node to persistently forward EF packets at less than the configured
rate. By including E in the d(j) <= F(j) + E inequality rather than
in the recursion that defines F(j), the worst that can happen is
that forwarding is shifted forward in time by at most E. That is,
the E term allows a fixed delay for the forwarding of the entire EF
stream, but it does not allow the rate of forwarding to be less than
the EF-configured rate. Putting yet another way, it is a difference,
but not a differential.
Appendix C: Proofs of Satisfiability of the Packet Scale Rate Guarantee
Definition for PQ and WF2Q
C.1 Satisfiability of the Packet Scale Rate Guarantee Definition for PQ
In this section, we prove that a priority queuing (PQ) scheduler
satisfies the EF redefinition using the latency term E = MTU/C.
Statement C1.
============
Charny May 2000 18
EF PHB Redefined Nov 2000
PQ satisfies the redefinition (equations (eq_1) and (eq_2) of
section 2.1) with E=MTU/C.
Proof of C1.
Consider any busy period of the EF queue. Let k=1 correspond to
the first packet in that busy period and assume that a(1) >=0.
We prove by induction that for all k >=1 in this busy period
d(k) <= F(k)+MTU/C (eq_c1_1)
This would immediately imply Statement C1.
Base case.
For k=1,
F(1) = max(a(1), min(d(0), F(0)) + L(1)/R >= a(1) + L(1)/R >=
a(1)+L(1)/C (eq_c1_2)
and
d(1) <= a(1) + MTU/C + L(1)/C <= F(1) + MTU/C
where the first inequality follows from the fact that the first
packet in a PQ may wait at most for one largest packet
transmission before its own transmission begins, and the second
inequality follows from (eq_c1_2).
Inductive step.
Note that since EF has the highest priority, for k > 1 in the busy
period of the EF queue
d(k) = d(k-1) + L(k)/C (eq_c1_3)
Now from the induction hypothesis
F(k-1) >= d(k-1) - MTU/C
And the definition (eq_2) of section 2.1 of F(k) gives
F(k) >= max(a(k), min(d(k-1), d(k-1) - MTU/C))+ L(k)/C =
max(a(k), d(k-1)- MTU/C)+ L(k)/C (eq_c1_4)
It follows immediately from (eq_c1_4) that
F(k) >= d(k-1)- MTU/C + L(k)/C
Combining with (eq_c1_3) demonstrates (eq_c1_1) and completes the
inductive step.
C.2 Satsifiability of the Packet Scale Rate Guarantee Definition for
WF2Q
Charny May 2000 19
EF PHB Redefined Nov 2000
In this section, we prove that a worst-case fair weighted fair
queuing (WF2Q) scheduler satisfies the EF redefinition using the
latency term E = MTU/C + MTU/R. The proof begins with a helping
theorem that brings us most of
the way to the conclusion.
Statement C2.
============
If a scheduler satisfies the condition
G(i) - E1 <= d(i) <= G(i) + E2 (eq_c2_1)
where G(i) is the i-th finishing time of the reference fluid
scheduler, then it satisfies the redefinition in section 2.1 with
latency term E <= E1 + E2
Proof of Statement C2.
----------------------
To prove Statement C2 we will prove that for all i >= 0
F(i) >= G(i) - E1 (eq_c2_2)
where F(i) is the set of finish times recursively defined by
(eq_2) of section 2.1.
If (eq_c2_2) is proven, then from (eq_c2_1) and (eq_c2_2)
d(i) <= G(i) + E2 <= F(i) + E1 + E2, which means that the scheduler
satisfies the redefinition with the latency term E = E1 + E2.
Proof of (eq_c2_2).
-----------------
First note that in the reference GPS system, packet i starts its
service at time max ( a(i), G(i-1)) and receives a service rate at
least equal to R. Thus
G(i) <= max ( a(i), G(i-1)) + L(i)/R (eq_c2_3)
Now the proof of (eq_c2_2) proceeds by induction.
Base case
F(0)=0, G(0) = 0, so (eq_c2_2) trivially holds for i=0.
Inductive step.
Suppose (eq_c2_2) holds for all j=0,1...i-1, (i>=1)
We have both F(i-1) >= G(i-1) - E1 and d(i-1) >= G(i-1) - E1,
thus
min (F(i-1), d(i-1)) >= G(i-1) - E1 (eq_c2_4)
Charny May 2000 20
EF PHB Redefined Nov 2000
Combining this with equation (eq_2) of section 2.1, we obtain
F(i) >= G(i-1) - E1 + L(i)/R (eq_c2_5)
Again from equation (eq_2) we have
F(i) >= a(i)+ L(i)/R >= a(i) - E1 + L(i)/R (eq_c2_6)
Combining (eq_c2_5), (eq_c2_6) and (eq_c2_3) gives F(i) >= G(i)-E1,
which completes the proof of (eq_c2_2) and statement C2.
Statement C3.
=============
WF2Q satisfies the redefinition (equations (eq_1) and (eq_2) of
section 2.1) with E = MTU/C + MTU/R
Proof of C3.
------------
It follows from the results of [BZ96a] that the departures in WF2Q
satisfy the condition
max(G(i-1), a(i))<= d(i) <= G(i) + MTU/C
From Equation (eq_c2_3) this implies that
d(i) >= G(i) - L(i)/R >= G(i) - MTU/R
Therefore (eq_c2_1) in Statement C2 holds with E1=MTU/R and
E2 = MTU/C. Therefore, by Statement C2, WF2Q satisfies the
redefinition with E=MTU/C + MTU/R.
Appendix D: Implementation Considerations - Values of the Latency Term
for Various Schedulers
D.1 General queuing and scheduling considerations.
The redefinition of EF given in section 2.1 does not mandate a
particular underlying queuing structure. While it can be
implemented using aggregate queuing, where all packets of the EF
aggregate share a single queue, it also allows finer queuing
granularity, where EF packets may be assigned to a number of
different queues.
Likewise, the redefinition allows in principle a wide range of
schedulingalgorithms, ranging from a strict priority scheduling of
aggregate EF queue, to hierarchical scheduling with per-flow queuing
as described in section D.4 below.
Both the queuing structure and the scheduling algorithm have a
significant impact on the delay and jitter which can be provided to
Charny May 2000 21
EF PHB Redefined Nov 2000
the packets of the EF aggregate. It is typically more difficult to
provide strict deterministic end-to-end delay and/or jitter
guarantees if aggregate queuing is implemented [CLeB2000]. However,
implementing and scheduling a large number of queues at high speeds
presents a significant engineering challenge, while aggregate
scheduling is very attractive due to its simplicity and scalability.
D.2 Aggregate Queuing and Scheduling Accuracy for FIFO Service of the
EF Aggregate
It can be shown that if all packets in the EF aggregate share a
single FIFO queue served by a scheduler satisfying the rate-latency
service curve, then end-to-end delay and jitter guarantees depend on
the latency term E of the scheduler [CLeB2000]. The smaller the
latency term, the better the delay and jitter bounds that can be
provided. In that respect, a strict priority queuing implementation
which has a very small latency term is a natural candidate for
implementing EF PHB. Various implementations of Weighted Fair
Queuing-like schedulers are also possible candidates for such
implementation, but the delay and jitter characteristics of these
schedulers differ substantially depending on the accuracy of the
implementation.
A widely used way of evaluating the accuracy of rate-based
scheduling implementations is to compare the output of the scheduler
with the so-called "fluid model" [Par92]. In this framework, a
given scheduler S and the reference fluid scheduler are subject to
the same arrival patterns. The accuracy of the scheduler S can be
determined by how close the time of the i-th departure in the
scheduler S is to the corresponding departure time in the fluid
scheduler. More precisely, if d(i) is the time of the i-th
departure under some scheduler S, and G(i) is the time of the i-th
departure in the reference fluid scheduler, then the accuracy of S
may be determined by two latency terms E1 and E2 such that for all
i
G(i)-E1 <= d(i)<= G(i) + E2
While the term E2 determines the maximum per-hop delay bound, E1 has
an effect on the jitter at the output of the scheduler. For
example, as shown in [BZ96a], for WF2Q, E1 = MTU/R, E2= MTU/C, and
for PGPS [Par92] E2 = MTU/C as well, while E1 is linear in the
number of queues in the scheduler. It is demonstrated in [BZ96a]
that while WF2Q and PGPS have the same delay bounds, PGPS may result
in substantially burstier departure patterns.
In general, it can be shown that if a scheduler satisfies DEF_2,
then it also satisfies the redefinition with the latency term E <=
E1 + E2. The proof of this statement is given in Appendix C. Note
that E1+E2 is not necessarily a tight latency bound, and for a given
scheduler a tighter bound may be obtained. That is, the fact that a
given scheduler has a large E1+E2 does not necessarily mean that is
has a large E.
Charny May 2000 22
EF PHB Redefined Nov 2000
D.3 Additional examples of efficient WFQ-Like Scheduling
Implementations and their Latency Terms.
In this section we briefly discuss some schedulers that can be used
to implement the redefined EF PHB with different degrees of accuracy
and with different implementation complexity.
D.3.1 Weighted Fair Queuing (WFQ/PGPS)
For WFQ/PGPS ([DKS90],[Par92]), E2 = MTU/C just as for the case of
WF2Q. However, it can be shown that E1 can grow linearly with
the number of queues in the scheduler (which here and below is
denoted by N). The worst case complexity of WFQ is also O(N).
D.3.2.Deficit Round Robin (DRR)
For DRR [SV95], both E1 and E2 can be shown to grow linearly with
N*(r_max/r_min)*MTU, where r_min and r_max denote the smallest and
the largest rate among the rate assignments of all queues in the
scheduler. The implementation complexity of DRR is O(1).
D.3.3. Start-Time Fair Queuing (SFQ) and Self-Clocked Fair Queuing
(SCFQ)
For SFQ [GVC96] and SCFQ [Gol94] both E1 and E2 can be shown to grow
linearly with N. Implementation complexity of both of these
schedulers is O(log N).
D.3.4 WF2Q+
For WF2Q+ [BZ96b], E1 = MTU/R, while E2 can grow linearly with N.
The implementation complexity of WF2Q+ is O(log N).
D.4. Hierarchical scheduling implementations
A possible implementation of EF PHB may be based on a hierarchical
scheduling framework, such as described in [FJ95]. In this
framework, different subsets of EF packets may be assigned to
different queues. The semantics of exactly how packets are
classified into different EF queues is highly implementation-
dependent. For convenience, the subset of EF packets sharing a
single queue will be referred to as "EF flows". The EF queues are
grouped in a "logical queue", which is scheduled as a single entity
along with other non-EF queues or groups of queues by a "top-level"
scheduler. It is this top-level scheduler that must satisfy DEF_1.
Once the EF aggregate (i.e the EF "logical queue") is scheduled by
this top-level scheduler, an "EF flow-level" scheduler is invoked.
As an example, a hierarchical scheduler with WF2Q at each level of
the hierarchy (as described in [BZ96b]) can be used for such a
purpose. Alternatively, the EF "logical" queue can be served at
Charny May 2000 23
EF PHB Redefined Nov 2000
strict priority over all non-EF queues, while the EF queue at the
"EF flow" level can be served by some other scheduler, such as WF2Q.
In principle, hierarchical scheduling structure allows a substantial
flexibility in the choice of scheduling mechanisms at each level of
the hierarchy. Per-packet delay guarantees in such a hierarchical
scheduling framework strongly depend on the accuracy of schedulers
employed at each level of the hierarchy. In general, the more
accurate the scheduling implementation at each level, the better the
per-packet guarantee that can be provided. It can be shown that for
the scheduling hierarchy, the E1 and E2 latency terms of the
hierarchical scheduler with respect to a particular "leaf queue"
can be obtained by summing the E1 and E2 terms of the
schedulers employed at the nodes of the scheduling tree along
the ascending branch of the tree from the root to the leaf.
D.5. Effect of internal switching mechanisms
A packet passing through a router will experience delay for a number
of reasons. Two familiar components of this delay are the time the
packet sits in a buffer at an outgoing link waiting for the
scheduler to select it and the time it takes to actually transmit
the packet on the outgoing line.
There may be other components of a packet's delay through a router,
however. A router might have to do some amount of header processing
before the packet can be given to the correct output scheduler, for
example. In another case a router may have a FIFO buffer (called a
transmission queue in [FC2000]) where the packet sits after being
selected by the output scheduler but before it is transmitted. In
cases such as these, the extra delay a packet may experience can be
accounted for by absorbing it into the latency term, E, in DEF_1.
Implementing EF on a router with a multi-stage switch fabric
requires special attention. A packet may experience additional
delays due to the fact that it must compete with other traffic for
forwarding resources at multiple contention points in the core. The
delay an EF packet may experience before it even reaches the output-
link scheduler should be included in the latency term. Input-
buffered and input/output-buffered routers may also require
modification of their latency terms.
Delay in the switch core comes from two sources, both of which must
be considered. The first part of this delay is the fixed delay a
packet experiences regardless of the other traffic. This component
of the delay includes the time it takes for things such as packet
segmentation and reassembly in cell based cores, enqueueing and
dequeueing at each stage, and transmission between stages. The
second part of the switch core delay is variable and depends on the
type and amount of other traffic traversing the core. This delay
comes about if the stages in the core mix traffic flowing between
different input/output port pairs. Thus, EF packets must compete
against other traffic for forwarding resources in the core. Some of
Charny May 2000 24
EF PHB Redefined Nov 2000
this competing traffic may even be EF traffic from other aggregates.
This introduces extra delay, that can also be absorbed by the
latency term in the definition.
Appendix E: Comparison of the Packet Scale Rate Guarantee with the
Rate-Latency Curve
To understand the meaning of the redefinition (equations eq_1 and
eq_2, in section 2.1) we compare it with a well-known rate-latency
curve [LEB98], and argue that the redefinition is stronger than the
rate-latency curve [LEB98] in the sense that if a scheduler
satisfies the redefinition, it also satisfies the rate-latency
curve. As a result, all the properties known for the rate-latency
curve also apply to the redefinition. We also argue why the
redefinition is more suitable to reflect the intent of EF PHB than
the rate-latency curve.
It is shown in [LEB98] that the rate-latency curve is equivalent to
the following definition:
Definition DEF_2:
d(j) <= F'(j) + E (eq_3)
where
F'(0)=0,
F'(j)=max(a(j), F'(j-1))+ L(j)/R for all j>0 (eq_4)
It can be easily verified that the redefinition is stronger than
DEF_2 by noticing that for all j, F'(j) >= F(j).
It is easy to see that F'(j) in the definition DEF_2 corresponds to
the time the j-th departure should have occurred should the EF
aggregate be constantly served exactly at its configured rate R.
Following the common convention, we refer to F'(j) as the "fluid
finish time" of the j-th packet to depart.
The intuitive meaning of the rate-latency curve of DEF_2 is that any
packet is served at most time E later than this packet would finish
service in the fluid model.
For a rate-latency curve DEF_2 (and hence for the stronger
redefinition) it holds that in any interval (0,t) the EF aggregate
gets close to the desired service rate R (as long as there is enough
traffic to sustain this rate). The discrepancy between the ideal and
the actual service in this interval depends on the latency term E,
which in turn depends on the scheduling implementation. The smaller
E, the smaller the difference between the configured rate and the
actual rate achieved by the scheduler.
While DEF_2 guarantees the desired rate to the EF aggregate in all
intervals (0,t) to within a specified error, it may nevertheless
Charny May 2000 25
EF PHB Redefined Nov 2000
result in large gaps in service. For example, suppose that (a large
number) N of identical EF packets of length L arrived from different
interfaces to the EF queue in the absence of any non-EF traffic.
Then any work-conserving scheduler will serve all N packets at link
speed. When the last packet is sent at time NL/C, where C is the
capacity of output link, F(N) will be equal to NL/R. Suppose now
that at time NL/C a large number of non-EF packets arrive, followed
by a single EF packet. Then the scheduler can legitimately delay
starting to send the EF packet until time F(N+1)=(N+1)L/R + E - L/C.
This means that the EF aggregate will have no service at all in the
interval (NL/C, (N+1)L/R + E - L/C). This interval can be quite
large if R is substantially smaller than C. In essence, the EF
aggregate can be "punished" by a gap in service for receiving faster
service than its configured rate at the beginning.
The redefinition alleviates this problem by introducing the term
min(d(j-1), F(j-1)) in the recursion. Essentially, this means that
the fluid finishing time is "reset" if that packet is sent too
early. As a consequence of that, for the case where the EF aggregate
is served in a FIFO order, suppose a packet arrives at time t to a
server satisfying the redefinition. The packet will be transmitted
no later than time t + Q(t)/R + E, where Q(t) is the EF queue size
at time t (including the packet under discussion). This statement is
proved in Appendix C.
7. References
[BZ96a] J.C.R. Bennett and H. Zhang, ``WF2Q: Worst-case
Fair Weighted Fair Queuing'', INFOCOM'96, Mar, 1996
[BZ96b] J.C.R. Bennett and H. Zhang, Hierarchical
Packet Fair Queuing Algorithms. IEEE/ACM Transactions
on Networking, 5(5):675-689, Oct 1997. Also in
Proceedings of SIGCOMM'96, Aug, 1996
[RFC2475] Black, D., Blake, S., Carlson, M., Davies, E., Wang,
Z. and W. Weiss, "An Architecture for Differentiated
Services", RFC 2475, December 1998.
[LEB98] J.-Y. Le Boudec, "Application of Network Calculus To
Guaranteed Service Networks", IEEE Transactions on
Information theory, (44) 3, May 1998
[Bra97] Bradner, S., "Key Words for Use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[CLeB2000] A. Charny, J.-Y. Le Boudec "Delay Bounds in a
Network with Aggregate Scheduling". To appear in Proc.
of QoFIS'2000, September 25-26, 2000, Berlin, Germany.
[DKS90] A. Demers, S. Keshav, and S. Shenker, "Analysis
and Simulation of a Fair Queuing Algorithm". In
Journal of Internetworking Research and Experience,
Charny May 2000 26
EF PHB Redefined Nov 2000
pages 3-26, October 1990. Also in Proceedings of ACM
SIGCOMM'89, pp 3-12.
[FC2000] T. Ferrari and P. F. Chimento, "A Measurement-
Based Analysis of Expedited Forwarding PHB
Mechanisms," Eighth International Workshop on Quality
of Service, Pittsburgh, PA, June 2000,
[FJ95] S. Floyd and V. Jacobson, "Link-sharing and Resource
Management Models for Packet Networks", IEEE/ACM
Transactions on Networking, Vol. 3 no. 4, pp. 365-
386,August 1995.
[Gol94] S.J. Golestani. "A Self-clocked Fair Queuing
Scheme for Broad-band Applications". In Proceedings of
IEEE INFOCOM'94, pages 636-646, Toronto, CA, April
1994.
[GVC96] P. Goyal, H.M. Vin, and H. Chen. "Start-time
Fair Queuing: A Scheduling Algorithm for Integrated
Services". In Proceedings of the ACM-SIGCOMM 96, pages
157-168, Palo Alto, CA, August 1996.
[RFC2598] V. Jacobson, K. Nichols, K. Poduri, "An Expedited
Forwarding PHB", RFC 2598, June 1999
[RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black,
"Definition of the Differentiated Services Field (DS
Field) in the IPv4 and IPv6 Headers", RFC 2474,
December 1998.
[JNP2000] V. Jacobson, K. Nichols, K. Poduri,
"The 'Virtual Wire' Behavior Aggregate,"
(draft-ietf-diffserv-ba-vw-00.txt), March 2000.
[Par92] A. Parekh. "A Generalized Processor Sharing
Approach to Flow Control in Integrated Services
Networks". PhD dissertation, Massachusetts Institute
of Technology, February 1992.
[SV95] M. Shreedhar and G. Varghese. "Effient Fair Queueing
Using Deficit Round Robin". In Proceedings of
SIGCOMM'95, pages 231-243, Boston, MA, September 1995.
[Sto95] I. Stoica and H. Abdel-Wahab, "Earliest Eligible
Virtual Deadline First: A Flexible and Accurate
Mechanism for Proportional Share Resource Allocation",
Technical Report 95-22, Old Dominion University,
November 1995.
8. Authors' addresses
Anna Charny, ed.
Charny May 2000 27
EF PHB Redefined Nov 2000
Cisco Systems
300 Apollo Drive
Chelmsford, MA 01824
acharny@cisco.edu
Fred Baker
Cisco Systems
170 West Tasman Dr.
San Jose, CA 95134
fred@cisco.com
Jon Bennett
RiverDelta Networks
3 Highwood Drive East
Tewksbury, MA 01876
jcrb@riverdelta.com
Kent Benson
Tellabs Research Center
3740 Edison Lake Parkway #101
Mishawaka, IN 46545
Kent.Benson@tellabs.com
Jean-Yves Le Boudec
ICA-EPFL, INN
Ecublens, CH-1015
Lausanne-EPFL, Switzerland
leboudec@epfl.c
Angela Chiu
AT&T Labs
100 Schulz Dr. Rm 4-204
Red Bank, NJ 07701
alchiu@att.com
Bill Courtney
TRW
Bldg. 201/3702
One Space Park
Redondo Beach, CA 90278
bill.courtney@trw.com
Shahram Davari
PMC-Sierra Inc
555 Legget drive
Suit 834, Tower B
Ottawa, ON K2K 2X3, Canada
shahram_davari@pmc-sierra.com
Bruce Davie
Cisco Systems
300 Apollo Drive
Chelmsford, MA 01824
Charny May 2000 28
EF PHB Redefined Nov 2000
bsd@cisco.com
Victor Firoiu
Nortel Networks
600 Tech Park
Billerica, MA 01821
vfirou@nortelnetworks.com
Charles Kalmanek
AT&T Labs-Research
180 Park Avenue, Room A113,
Florham Park NJ
crk@research.att.com.
K.K. Ramakrishnan
AT&T Labs-Research
Rm. A155, 180 Park Ave,
Florham Park, NJ 07932
kkrama@research.att.com
Dimitrios Stiliadis
Lucent Technologies
1380 Rodick Road
Markham, Ontario, L3R-4G5, Canada
stiliadi@bell-labs.com
9. Full Copyright
Copyright (C) The Internet Society 2000. All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph
are included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Charny May 2000 29
EF PHB Redefined Nov 2000
[This Page Intentionally Left Blank ]
Charny May 2000 30