Internet DRAFT - draft-christian-tewg-measurement
draft-christian-tewg-measurement
TE Working Group B. Christian
Internet Draft UUNET
Document: draft-christian-tewg-measurement-00.txt B. Davies
Category: Informational UUNET
H.Tse
UUNET
Jul 2000
Operational measurements for Traffic Engineering
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026 [1].
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts. Internet-Drafts are draft documents valid for a maximum of
six months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet- Drafts
as reference material or to cite them other than as "work in
progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
1. Abstract
This memo describes measurement in order to accomplish Traffic
Engineering (TE) in IP networks. This document will aid vendors in
their choice of information to provide; it will assist network
operators in determining the appropriate information to request; and
will demonstrate how measurements are used to accomplish TE. The
objective of this memo is to describe TE measurement. This memo
will also describe (in brief) some methods for using the variables
and some methods for gathering the information.
Christian/Davies/Tse Informational - Dec2000 1
draft-christian-tewg-measurement-00.txt July 2000
2. Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in RFC-2119 [2].
3. What is Traffic Measurement?
Traffic Measurement (TM) is defined for the purposes of this
document as a means of characterizing a flow of IP packets from one
point to another. The characteristics of a traffic flow can be
loosely defined as Throughput, Loss, Delay, Path, and Lifetime.
These characteristics should be represented in every device that
carries a flow of IP traffic. Delay variation and other measures
are modifications of the above. A traffic flow can become
arbitrarily specific. An example would be the measurement of
traffic on a physical link as compared to measuring traffic on a
virtual link. A physical link with many virtual links will
aggregate a number of smaller traffic flows. A flow can also be an
aggregate of physical links in schemes such as link bundling or
ECMP.
The measurement of traffic is meant to "facilitate reliable network
operations." [AWD1] Traffic measurement provides a means for
capacity planning as well as a means to work around congestion.
Traffic measurement standards need to be protocol independent and
should be portable across platforms. Traffic measurement is
accomplished with the goals of modifying the path of traffic,
allocating capacity, reducing congestion, and observing trends.
4. Advantages of TE measurements
4.1 Real-time and long-term TE measurement
TE measurements are instrumental in providing real-time as well as
long-term proactive TE. Network performance may be evaluated by
examination of TE measurements. Measurements, such as throughput
vs. maximum bandwidth, can indicate link utilization and link
congestion on the network. Due to the transient nature of the
network, the measurements must be able to derive the real-time
characteristics of the network to be effective.
Christian/Davies/Tse Informational - Dec2000 2
draft-christian-tewg-measurement-00.txt July 2000
Over a period of time, measurement metrics should be able to provide
for long-term TE. Long term TE includes traffic growth patterns,
congestion issues, and traffic peak patterns. Traffic growth and
peak patterns can be derived from measurements such as throughput
and peak rate. Measurements must facilitate proactive TE strategies
to optimize the network or to avoid undesirable network conditions.
4.2. Measurements for traffic management
4.2.1. Load balancing
To perform TE is to be able to optimize network traffic flows and
balance network traffic on multiple trunks. During load balancing,
traffic will be partitioned at the incoming interface onto multiple
virtual paths. In the case of virtual links, based on the TE
measurements, secondary link(s) with the appropriate requirements
may be created to accommodate load balancing. Measurement of
available bandwidth, loss, and delay are critical in determining the
feasibility of creating secondary connections.
Measurements, such as available bandwidth, change constantly. The
network will not be in a steady load-balanced state because of its
dynamic changing flow. In order to achieve a load-balanced steady
state TE measurements are needed to determine recomputation and
optimization intervals.
4.2.2. Policy-based TE measurements
Policy-based TE provides flexibility in the specification of the
network optimization objectives and constraints. Policy can be
adjusted or fine-tuned on a continuous basis. Policy attributes on
network path include priority, preemption, resilience, resource
classes and policing.
Policy-based TE measurements should compare the metric values with
the thresholds based on the policy to trigger the appropriate
actions. Policy-based measurements can be used to identify
potential network traffic issues. Comparison of the measurements
and policy-based thresholds can be setup statically at a predefined
time interval or dynamically at event occurrence.
For instance, in the event of path preemption, the traffic pattern
can be impacted and the traffic flow changes. Measurements should
be compared with the threshold values to ensure that proper actions
Christian/Davies/Tse Informational - Dec2000 3
draft-christian-tewg-measurement-00.txt July 2000
are taken if the preemption induces undesirable effects on the
traffic pattern. Policy-based TE should be in compliance with the
Policy Information Base (PIB) specifications.
Constraint-based routing (CBR) TE specifies a finer subset of the
policy-based TE. CBR takes place when all the specified constraints
are met by the TE measurements. Measurements must provide traffic
characteristics in order to facilitate constraint-based routing
comparison. Constraint specifications can include peak rate,
committed rate and service levels. Policy-based TE measurements,
such as bandwidth availability, can be compared with the peak rate
and committed rate constraints to determine if they are met or not.
4.2.3. Measurements for Path Protection/Restoration
Fault detection, path protection, and restoration are imperative in
an operations environment. TE measurements are essential to ensure
these mechanisms are in place. Faults can be identified using TE
measurements such as packet loss or low throughput. Notifications
may be generated automatically based on the observed value of these
variables.
Other metrics can determine the amount of spare capacity for
different failure recovery scenarios. For example:
a. Prior to restoring traffic to the original path
b. Prior to creating the protection path
Examination of TE measurement metrics can also be used to ensure
that there is no overlap of the primary and secondary paths.
5. Throughput
Throughput is a measure of the amount of traffic that passes between
a set of end points, where end points can be logically or physically
defined. The amount of traffic is a measure of the quantity of bits
that pass over a period of time. This is usually represented as
Bits Per Second or BPS. Another facet of throughput is Packets Per
Second or PPS. PPS is infrequently used. However, PPS in
conjunction with BPS will allow the operator to determine average
packet sizes. Average packet size is an important measure as some
vendors can have problems passing small packets at line rate.
Christian/Davies/Tse Informational - Dec2000 4
draft-christian-tewg-measurement-00.txt July 2000
Both Medium and Long Term TE require a measure of throughput for
intervention in scenarios of decreasing bandwidth availability as
well for planning future capacity needs. Throughput measurement
will also be important in situations where new software creates the
demand for dynamic IP flow controls. See [AWD2] for a more detailed
explanation of TE over time.
Throughput for general usage is best measured at a regular interval.
Most operators choose 5 minutes as their interval of choice. This
provides for an approach that is granular without being so
aggressive that the amount of data recorded becomes overwhelming.
The use of the 5-minute interval is best when active traffic
measurement (active traffic measurement is measurement with network
operator involvement) is not being performed. The choice of 5-
minute interval provides for enough data to identify
daily/monthly/weekly trends. This data is used to predict capacity
needs and to identify points of rising congestion. During periods
of active traffic measurement intervals of 5 seconds are not
uncommon. Active throughput measurement is undertaken in order to
provide a means of working with points of congestion. With active
throughput measurement the operator will identify flows and choose
alternate paths or other modifications of flow parameters. Active
throughput measurement also provides a means of monitoring changes
to network parameters and the impact on traffic during production
traffic engineering efforts.
Vendors provide various levels of throughput measurement. Some
vendors choose to measure throughput as the amount of IP traffic
passed. Unfortunately, with differing methods it becomes necessary
to remember which vendor you are measuring and adjust appropriately.
An example would be switch vs. router. Many switches report the
throughput of their protocol (such as ATM) which is, of course,
greater than the throughput possible for an IP packet encapsulated
within the protocol. A measure of throughput, which relates the
most to what an IP packet perceives as throughput, would include
only the IP packet. Additional encapsulation can create a false
sense of capacity since some methods of switching can take up
significant amounts of bandwidth (see ATM). The above statements
seem to indicate that the best method for representing IP traffic is
to subtract all additional forms of encapsulation from your
measurements. This requires that the amount of space used for
encapsulation be well known. For most encapsulation methods this
works quite well since the amount of space necessary is well known.
Christian/Davies/Tse Informational - Dec2000 5
draft-christian-tewg-measurement-00.txt July 2000
The 95th percentile is used to determine flow utilization. The
percentile allows the capacity planner to determine future needs
while avoiding the statistical anomalies that are inherent in packet
networks. For the network operator 95 percent utilization is used
to set alarms as well as determine that flows are approaching their
predefined thresholds.
6. Loss
A flow has certain requirements it must satisfy in order to be
considered a quality service. The degree of loss is an important
factor. No internet service (or it's component flows) will always
be 100% loss free, therefore the loss constraint must be defined
based on network dynamics and internal system constraints (topology,
bandwidth etc.). What is acceptable loss? None is the preferred
answer, but that is not always practical or possible.
Generically, loss can be viewed as a quality attribute of a flow.
The loss attribute of a flow, when compared to the predetermined
constraints allows for problem determination. Accounting and
measurement (real-time and long-term) provide the necessary
information for developing a solution and finding the best possible
resolution based on the system constraints.
Traceroute & Ping at L3 allow the user to see loss and latency.
Traceroute at L2 (in an overlay) can allow the user to see problems
at L2. Physical outages and errors can lead to any number of higher
level errors.
Loss can be caused by outages in a bandwidth guaranteed TDM system
(such as SONET/SDH) where no statistical gain is generally achieved.
Loss can also be attributed to statistical systems where demand
outweighs supply.
(input port 1 + input port 2) > output port 1
Protection schemes such as (1+1, 1:1, N:1) can be used to mitigate
TDM loss. Buffering, scheduling, and randomized discard strategies
can be used to mitigate statistical loss and protection schemes.
A laundry list of required values needed to mitigate, plan for, and
resolve a flow's loss attribute would include:
Per traffic class loss statistics. (ex UBR/ABR/VBR/CBR, multiple
FECs, diffserv)
Christian/Davies/Tse Informational - Dec2000 6
draft-christian-tewg-measurement-00.txt July 2000
-Intentional loss (RED, policy, contract enforcement)
-Unintentional loss (buffer over-utilization, congestion, etc)
-Total loss (cause independent)
Per flow loss statistics (VC, DLCI, LSP)
-Intentional loss
-Unintentional loss
-Total loss
Per interface loss statistics
-Intentional loss
-Unintentional loss
-Total loss
7. Delay
Delay measurement, defined as the time it takes for a packet to
travel from source to destination, is a must for any IP forwarding
device. Delay directly affects the responsiveness of protocols such
as TCP across the network. Round-trip packet delay, in some cases,
may not be equal to twice the one-way packet delay due to asymmetric
paths. On an uncongested network, delay value will provide the
ability to measure propagation and transmission delay. Delay
measurement is very useful as the use of real time and delay
sensitive applications is growing.
Along with end-to-end delay, buffer delay should also be taken into
consideration and measured separately. Buffer delay is defined for
the purposes of this document as the time it takes for a node to
transfer/switch a packet from the ingress to the egress interface.
This value is dependent on the type/bandwidth of the ingress and
egress interfaces. Vendors have different implementations of the
memory pools used for packet buffering e.g. per interface buffers or
the use of a global pool of memory buffers, resulting in different
values when measuring buffer delay. In other words, different
vendors can have different ingress to egress transit times.
Measurement of buffer delay will create the ability to determine the
amount of time involved in transiting a device. This will help
operators to determine congestion points as well as equipment
performance under load. In test scenarios the measurement of buffer
delay is academic since, in most situations, the path will not have
a speed of light delay that is measurable. Sending alerts based on
buffer delay provides a means of determining congestion without
relying on tools such as ping which can add to the problem. Ping
Christian/Davies/Tse Informational - Dec2000 7
draft-christian-tewg-measurement-00.txt July 2000
and similar tools are also external indicators of performance issues
and may not monitor all paths through the network (ECMP for
example). Pandiculation of buffer sizes will increase potential
buffer delay and some vendors provide methods for doing this.
Application level programs like ping and traceroute provide a means
of measuring end-to-end delay. Most network management systems rely
on pings to monitor performance of a given path. Methodologies for
delay measurement on a node level will vary depending on vendor
implementation. If all the nodes in the path of a packet are closely
synchronized to a GPS clock, NTP (network time protocol) can be used
as one way to measure packet delay. The source node will place a
time-stamp in the packet and send it towards the destination. The
destination node, upon receiving the packet, time-stamps it. The
difference in value of the two time-stamps, along with any
adjustment (adjustments may be necessary due to differences in clock
synchronization) is one-way packet delay. The process can be
repeated periodically with 3 to 5 packets sent in each instance.
In addition to buffer delay, delay measurements can be impacted by
frame translation. When IP traffic is being switched or routed from
a device to another, SAR process can take place to translate the
frame format. This will add delay into the switching or routing.
Delay metrics for TE measurement can be optimized by engineering
flows to avoid unnecessary frame translation or SAR.
8. Path
Path can be described as the hops that packets in a flow will take
from ingress node to egress node. It is not uncommon for there to
be three separate layers of path information, from physical layer,
to switched layer, to IP layer. Programs such as traceroute and
ping can provide a record of the nodes that a packet has to
traverse. Ping and traceroute only provide IP layer information and
when a traceroute UDP packet, or a ping with a record option set, is
received by a node the packet leaves the switching path and the
information regarding the switching environment in the node is lost.
Path information provides the ability to determine a flows preferred
topology. Maintaining a history of previously preferred paths
provides the ability to determine where a flow has previously lived
and will provide the ability to prepare for network failures.
Historical path information is used to determine failure scenarios
Christian/Davies/Tse Informational - Dec2000 8
draft-christian-tewg-measurement-00.txt July 2000
that would represent overload based on aggregate potential flows
over failover links (links that are preferred during outages).
Hop count generally indicates on a node level how many nodes a
packet has traversed in its quest for a destination. Simply
counting the number of hops that a flow commonly prefers and sending
a alert when the count exceeds thresholds will provide the ability
to determine that a path has reach an unreasonable length or that
network state has changed.
9. Lifetime
The lifetime of a flow is simply the measurement of the total time
that the flow exists. As stated before, a flow can exist on a
physical or logical interface and could be permanent (such as a
backbone connection) or dynamic (perhaps a VPN connection at certain
times of day). The lifetime can be used in several ways to help
facilitate reliable network operations.
In a perfect world, a permanent flow would have an infinite
lifetime. In reality, link outages, equipment failures, or
scheduled maintenance will always cause flow to have a finite
lifetime. By tracking the lifetime of the flow, it's performance
and reliability may be characterized. The information gleaned from
flow lifetimes could be applied to a network monitoring tool to
alert operators to potential problems at lower OSI layers.
Dynamic flow lifetime information is also very useful to operators
or capacity planners. The range of specialized IP services offered
continues to grow, and planners will need to be able to maximize the
use of their network resources (while minimizing loss of course).
By understanding the lifetime of flows on the network it is possible
to optimize traffic to use the network to the fullest extent while
still maintaining an acceptable level of quality.
10. Applications of TE measurement
Over a period of time, static and dynamic measurement metrics should
be able to provide data for long-term TE. Long term TE includes
traffic growth patterns, congestion issues and traffic peak
patterns. Traffic growth and peak patterns can be derived from
measurements such as peak and average rate. Measurements must
facilitate proactive TE strategy planning to optimize the network
and to avoid undesirable network conditions.
Christian/Davies/Tse Informational - Dec2000 9
draft-christian-tewg-measurement-00.txt July 2000
It is incumbent on the operator to determine intervals in which
measurements should be accomplished. The rate of change in the 95th
percentile (throughput change over time) should cue the network
operator to increase the frequency of TE efforts. An operator in
the summer months may adjust flow parameters on a monthly basis and
in the winter months the operator may need to adjust on a weekly
basis. Tracking the rate of change over time will help the operator
predict this type of behaviour.
Policy-based TE measurements should compare metric values with
thresholds based on the policy to trigger the appropriate actions.
The policy-based measurements should be able to alert operators to
potential traffic issues. The comparison of measurements and
policy-based thresholds can be setup statically at a pre-defined
interval or dynamically at event occurrence. For instance, in the
event of path preemption, the traffic pattern can be impacted and
the traffic flow changed. Measurements should be compared with the
threshold values to ensure proper actions to be taken if the
preemption induces some undesirable effect on the traffic pattern.
Policy-based TE should be in compliance with Policy Information Base
(PIB) specifications.
Constraint-based routing (CBR) TE specifies a finer subset of
policy-based TE. CBR takes place when all the specified constraints
are met by the TE measurements. Measurements must provide the
explicit traffic characteristics in order to perform the comparison
for CR. Constraint specifications can include peak rate, committed
rate and service levels. Policy-based TE measurements, such as
bandwidth availability, can be compared with the peak rate and
committed rate constraints to determine if they are met.
11. Additional TE measurement considerations
11.1. Protocol-independent link bundling considerations
In order to reduce the overhead in managing multiple virtual links
that are originated and destined from the same ingress and egress
points, there is proposal to aggregate links for network
optimization. Component links will have same constraints, resource
classes and attributes. Multiple virtual links are treated as a
single IP link. TE measurements, such as bandwidth availability,
Christian/Davies/Tse Informational - Dec2000 10
draft-christian-tewg-measurement-00.txt July 2000
throughput, should consider the measurements for bundled virtual
links.
There are ongoing discussions on virtual link/channel bundling for
various standards under development or enhancement, such as MPLS,
optical network. TE measurements for virtual link/channel bundling
should be protocol independent and media independent to ensure
portability and commonality in the measurements.
11.2. Feedback mechanisms for topology state considerations
As part of the constraint-based routing measurements, all nodes
require topology state information. TE measurements should provide
information, such as link availability, and maximum
constraints/resources that each link can meet. Topology
information, such as throughput, loss, and bandwidth availability,
changes continuously in a large-scale environment. Information
distribution methodology is usually based on flooding or pre-
determined algorithm for topology changes. It takes distribution
and updating time to synchronize topology information while
bandwidth measurements could be changed immediately. As a result,
not every node will have the same topology view. In a large-scale
operations environment, the topology information discrepancies on
different nodes can be a problem in the event of failure or during
recovery.
TE measurements should consider the recent proposal for signaling
protocol to include the actual link bandwidth availability at every
link that it traverses. This feedback mechanism for topology will
require additional TE measurements to provide the actual information
as part of the reverse flowing messaging. The RSVP TLV-type of
measurements should be protocol independent. In addition to the
feedback on the actual bandwidth, future TE measurements should
consider information on the actual utilization, current congestion,
and number of channels or wavelengths available as part of the
feedback mechanism.
11.3 Optical network considerations
Christian/Davies/Tse Informational - Dec2000 11
draft-christian-tewg-measurement-00.txt July 2000
Optical network development is adding new dimensions to TE
measurements. The role of optical switches in the traditional data
router/switch network is increasing, TE measurements need to provide
information on optical performance.
Optical performance measurements for TE should include LOS, BER,
insertion loss, OSNR, optical channel registration, optical
compliance deviation, and optical power level. The information can
be distributed to the edge devices that interface the optical layer
and data layer. With these optical network measurements and IP data
TE measurements, virtual paths/channels can be managed dynamically
and performance can be optimized.
The development of a traffic engineering control plane function in
the optical network will require additional TE measurements. There
can be similarities in TE measurements for optical channels and
labels, specifically resource availability and constraints for
network dimensioning.
11.4. ICMP extensions for one-way performance metrics
TE measurements should consider the extension of ICMP for one-way
traffic measurements. The new ICMP messages, type 41, and type 42,
are probe packets for probe request message and probe reply message,
respectively. They can provide information on one-way delay based
on timestamp information and one-way loss rate based on the encoded
sequence number. The one-way delay and one-way loss can be useful
in the TE one-way performance metric measurements.
11.5. New requirement considerations
Internet application development is increasing the complexity in the
TE metrics. An example is TE multicast, which requires measurements
to facilitate traffic optimization when multicast and unicast
traffic co-exist. TE measurements for multicast need to provide
information on constraints such as network utilization channel
availability, delay, loss and throughput when creating the multicast
tree. Similarly, additional considerations for TE measurements are
needed for the voice over IP applications.
Christian/Davies/Tse Informational - Dec2000 12
draft-christian-tewg-measurement-00.txt July 2000
12. Acknowledgments
Special Thanks to Syed Malik, Josh Wepman, Brad Volz, Roshan
Winslow, and Rick Glasser from UUNET. And yet more thanks to Ed
Balas and Mark Davisson from Caimis and to Abha Ahuja from the
University of Michigan.
11. Authors' Addresses
Blaine Christian
UUNET
Blaine@uu.net
Brian Davies
UUNET
Daviesb@uu.net
Heidi Tse
UUNET
Htse@uu.net
12. References:
[AWD1] D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus,
"Requirements for Traffic Engineering over MPLS," RFC 2702 September
1999
[AWD2] D. Awduche, A. Chiu, A. Elwalid, I. Widjaja, X. Xiao "A
Framework for Internet Traffic Engineering", Work in Progress, May
2000
Christian/Davies/Tse Informational - Dec2000 13