Internet DRAFT - draft-duffield-framework-papame
draft-duffield-framework-papame
INTERNET-DRAFT Nick Duffield
draft-duffield-framework-papame-01 Albert Greenberg
Matthias Grossglauser
Feb 27, 2002 Jennifer Rexford
AT&T Labs - Research
A Framework for Passive Packet Measurement
Copyright (C) The Internet Society (2001). All Rights Reserved.
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
A wide range of traffic engineering and troubleshooting tasks rely
on reliable, timely, and detailed traffic measurements. We describe
a passive packet measurement framework that is (a) general enough
to serve as the basis for a wide range of operational tasks, and
(b) relies on a small set of primitives that facilitate uniform
deployment in router interfaces or dedicated measurement devices,
even at very high speeds. This document describes the motivation
for such a framework through several operational examples, defines
the measurement primitives (filtering, sampling, and hashing), and
illustrates their use.
1 Motivation
Framework: This document is described a framework for a standard
set of capabilities for network elements to sample packets and
report on them. One motivation to standardize these capabilities
comes from the requirement for measurement-based support for
network management and control across multivendor domains. This
requires domain wide consistency in the types of sampling schemes
available, the manner in which the resulting measurements are
presented, and consequently, consistency of the interpretation that
can be put on them.
Relation to other work: The measurement capabilities are positioned
as suppliers of packet samples to higher level consumers, including
both remote collectors and applications, and on board
measurement-based applications. Indeed, development of the
standards within the framework described here should take into
account the measurement requirements of standards in other IETF
WGs, including IPPM and TEWG. Conversely, we expect that aspects of
this framework not specifically concerned with the central issue of
packet sampling may be able to leverage work in other WGs. The
prime example is the format and export of measurement reports,
which may leverage the work of IPFIX.
Applications: We first describe several representative operational
applications that require traffic measurements at various levels of
temporal and spatial granularity.
Example 1: Troubleshooting
A network operator typically monitors aggregate statistics on a
per- link basis. Such aggregate statistics may include total number
of packets and bytes, dropped number of packets and bytes. These
statistics are typically moving averages over relatively long time
windows (e.g., 5 minutes), and serve as a coarse-grain indication
of operational health of the network. The most common method of
obtaining such measurements are through the appropriate SNMP MIBs
(MIB-II and vendor-specific MIBs.)
Suppose an operator detects a link that is persistently overloaded
and experiences significant packet drop rates. There is a wide
range of potential causes: routing parameters (e.g., OSPF link
weights) that are poorly adapted to the traffic matrix, e.g.,
because of a shift in that matrix; a denial of service attack or a
flash crowd; a routing problem (link flapping). In most cases,
aggregate link statistics are not sufficient to distinguish between
such causes, and to decide on an appropriate corrective action. For
example, if routing over two links is unstable, and the links flap
between being overloaded and inactive, this might be averaged out
in a 5 min window, indicating moderate loads on both links.
Hence, the operator must be able to drill down into the traffic on
a link, and obtain measurements that are more fine-grained both in
space and in time. The operator has to be able to determine how
many bytes/packets are generated for each source/destination
address, port number, and prefix, or other attributes, such as
protocol number, MPLS forwarding equivalence class (FEC), type of
service, etc. This allows to pinpoint precisely the nature of the
offending traffic. For example, in the case of a DDoS attack, the
operator would see a significant fraction of traffic with an
identical destination address.
Example 2: Characterizing Demand
Traffic engineering has two goals: optimizing the quality of
service provided to customers, and optimizing the use of network
resources. This is achieved through network-wide control of
routing, traffic classification and differentiation, and resource
allocation. Traffic measurements are necessarily part of such a
closed control loop. Specifically, the operator has to be able to
measure the total network-wide traffic demand at several levels of
granularity and time scales.
For example, in order to optimize intradomain routing by modifying
OSPF link weights or by configuring MPLS tunnels, the volume per
ingress-egress pair has to be measured (traffic matrix.) At a
longer time scale (weeks to months), measurements also drive
topology and capacity planning and the management of peering
agreements. Topology and capacity planning involves upgrading
links and routers and modifying the network topology to be
well-adapted to the prevailing traffic pattern. This includes
deciding where new customers should be attached. A natural
representation for traffic demand to drive topology and capacity
planning is a previous/next-hop AS traffic matrix, which
characterizes demand in terms of neighboring ASs. Managing peering
agreements, i.e., making strategic decisions about setting up and
retiring peering agreements, and modifying the terms of existing
ones (e.g., where to interconnect with peers.), benefits from a
source/destination AS traffic matrix, because the set of
neighboring ASs may change as a result of peering management.
Therefore, in general, it is necessary to obtain averages over
various time scales of the entire traffic carried by a network
domain. The spatial resolution of these averages include the
source and destination IP address, AS, prefix, port number, and the
previous and next hop AS with respect to the measurement domain.
Furthermore, if a service provider uses multiple service types, it
should also be possible to measure these matrices individually per
service type.
Example 3: Direct Observation of Network Behavior
In certain circumstances, precise information about the spatial
flow of traffic through the network domain is required to detect
and diagnose problems and verify correct network behavior. For
example, in the case of the overloaded link in Example 1, it would
be very helpful to know the precise set of paths that packets
traversing this link follow. This would readily reveal a routing
problem such as a loop, or a link with a misconfigured weight. More
generally, complex diagnosis scenarios can benefit from measurement
of traffic intensities (and other attributes) over a set of paths
that is constrained in some way. For example, if a multihomed
customer complains about performance problems on one of the access
links from a particular source address prefix, the operator should
be able to examine in detail the traffic from that source prefix
which also traverses the specified access link towards the
customer.
While it is in principle possible to obtain the spatial flow of
traffic through auxiliary network state information, e.g., by
downloading routing and forwarding tables from routers, this
information is often unreliable, outdated, voluminous, and
contingent on a network model. For operational purposes, a direct
observation of traffic flow is more reliable, as it does not depend
on any such auxiliary information. For example, if there was a bug
in a router's software, direct observation would allow to diagnose
the effect of this bug, while an indirect method would not.
2 Goals
The main goal of this proposal is to define a measurement framework
that relies on three canonical primitives: packet sampling,
filtering, and hashing. A wide spectrum of applications, including
those described in the previous section, are enabled by
measurements obtained through combinations of these three
primitives. Furthermore, a sampling device based on these
measurement primitives is relatively simple, as (a) it requires
only minimal per-packet processing, and (b) it requires little
(local) memory. Therefore, the proposed framework represents an
effective tradeoff between implementation complexity and the range
of traffic engineering applications and other operational tasks it
enables.
More generally, the following goals motivate the proposed framework:
o Greatly assist a very wide range of applications that can be
built on traffic measurement (Section 4), from a very small set of
primitives implemented ubiquitously.
o Aim for ubiquity, by including in the minimal set of primitives
functions that can be implemented at maximal line rate with minimal
additional state.
o Aim for ubiquity, by not forcing tight integration with packet
control actions (policing, marking, shaping, queueing).
o Allow for extensibility, which can be applied where needed
(depending on the application) for enhanced functionality.
o Aim for flexibility in data export format and options.
o A common data stream must support different applications, teams
and organizations (e.g., traffic engineering, marketing, billing)
concurrently.
o Allow for flexibility in implementation. In particular, export
of local router state information can be decoupled from export of
usage information.
o Ease of configuration of sampling an export parameters, e.g. for
automated remote reconfiguration in response to measurements.
o Allow transparent interpretation of measurements through
inclusion of sampling configuration in the reporting stream.
o Allow robust interpretation of measurements with respect to
reports missing due to loss in transport, or omission at the
measurement device.
3 Measurement Functionality
3.1 Measurement Information Flow
The framework for passive measurement has three main parts: the
selection of packets for measurement, the creation and export of
measurement reports, and the content and format of the measurement
records. Because of the increasing number of distinct measurement
applications, we believe it is desirable to set up parallel
measurement information flows from the stream of packets. Each
information flow should consist of independently-configurable
pipelines for selecting packets and exporting measurement records.
The processing of each measurement information flow should, as far
as possible, be independent. However, resource constraints may
prevent complete reporting on a packet selected for multiple
information flows. In this case, reporting for the packet must be
complete for at least one information flow; other information flows
need only report that they selected the packet. The priority
amongst information flows to report packets must be configurable.
3.2 Packet Selection
The function of packet selection is to select a subset out of the
stream of all packets. Selection may be used to select a subset of
packets of interest based on their content, and/or to reduce the
rate of packets into the measurement flow regardless of content.
Packet selection is performed through combination a number of
measurement primitives described below. In this document we do not
set any restrictions on the form these combinations can take.
o Hashing:
A hashing function operates on a subset of packet bits and
associates the resulting hash with the packet. Bit positions can
be excluded from the input to the hashing function by masking. This
ability would be used, for example, by applications that require
the hash to be independent on packet header fields, such as TTL or
header CRC, that are mutable on its passage through the network.
o Filtering:
Filtering is accomplished by applying mask/match operations to any
combination of bit positions from the packet and the configured
hashes. The mask/match operation is configurable independently for
each filter. Higher level interfaces to the match/mask primitive
may be used to specify mask and matches for particular fields, for
example, for IP addresses and/or TCP/UDP port numbers.
o Sampling:
Each sampler will be individually configurable to sample packets
with a certain probability p. Examples are probabilistic sampling,
in which each packet is selected quasirandomly with probability p,
and deterministic sampling, in which packets are sampled
periodically with period 1/p. In some sampling schemes, the
sampling probability may depend on the packet content. Sampling at
full line rate with probability p=1 is not excluded in principle,
although resource constraints may not support it in practice.
In order to be able to function at line rates, each measurement
primitive take as its input only a packet itself, or quantities
that have been calculated from the packet previously by other
measurement primitives. Router state is not assumed to be available
to the measurement primitives.
3.3 Report Generation and Export
Although the primary goal of this draft is to set up a framework
for the sampling operations themselves, utilization of the
resulting measurements places requirements the information
available for export, and the methods by which reports are
exported. Any scheme that can accommodate the framework described
in this section and section 3.4 is a convenient candidate for the
job.
Report preparation involves selecting fields of interest from each
sampled packet, then adjoining subsidiary information (e.g., hash
values, byte and packet counts, timestamps, etc.) from the
selection process and router state information. The router state
values may depend on the packet content (e.g., the IP prefix or
Autonomous System associated with the destination address in the IP
header, the input and output interfaces that carried the packet,
etc.). Reports may also include subsidiary quantities calculated
as a function of the selected packet and the router state. To
simplify the design, some of the subsidiary information and router
state may be incorporated when the records are exported, rather
than when the packets are selected. However, all such router state
information must be included for reporting in a timely manner, in
order that it reflects the actual state encountered by the packet.
The device generating the measurement records is configured to
transmit the data to one or more collection systems, identified by
IP address and port number. Exporting these records to other
systems introduces several practical issues that have important
implications on the analysis of the data:
o Transport: Two basic modes of transport are possible: unreliable
and reliable. In the unreliable mode, a completed measurement
packet from the export module is encapsulated into a UDP packet and
sent to the configured address (the collection system). The
sending device does not need to keep state about this packet (other
than possibly a sequence number to detect lost measurement
packets). In the reliable mode, the device exports records via a
TCP connection to the collection system. The device must be
capable of receiving packets (such as acknowledgments) from the
collection system and retransmitting lost packets.
o Export rate: The device should impose a (configurable) limit of
the number of measurement records per unit time. Otherwise, the
measurement device could overload the network and the collection
system. This problem would be exacerbated in the reliable
transport mode, where the device would retransmit any lost packets
(thereby imposing an additional load on the network). At times,
the device may generate new records faster than the allowed export
rate. In this situation, the device should discard the excess
records rather than transmitting them to the collection system.
The device may record information (such as sequence numbers, or
packet and byte counter values accumulated at the inputs and
outputs of a packet selector) to aid the collection system in
compensating for the missing data in any subsequent analysis.
o Maximum delay in exporting records: The device may queue
measurement records in order to export multiple records in a single
packet. However, the device should bound the delay in exporting
measurement records, even if the number of records is small. This
is important for two reasons. First, having an upper bound on the
export delay ensures that the collection system has up-to-date
information about the sampled packets. Second, in some scenarios,
the device may associate a timestamp with the record(s) at the
export stage. Limiting the delay in exporting the records places a
tight bound on the inaccuracy in the timestamp information.
The device can impose a (configurable) Maximum Transmission Unit
(MTU) size for reports.
o Local Export: packet reports may also be directly exported to
on-board measurement-based applications, for example those that
for composite statistics from more than one packet. Local export
may be presented through an interface direct to the higher level
applications, i.e., without employing the transport used for
off-board export.
3.4 Measurement Record Format
Report export involves the bundling of one or more measurement
records and sending a packet to the collection system. The report
includes several types of information, such as:
o Per-packet information: The measurement record for each sampled
packet includes various header fields (e.g., IP addresses, port
numbers, ToS bits, TCP flags, etc.), as well as subsidiary
information (e.g., timestamp, input and output links, other router
state, hash values, etc.).
o Configuration information: The stream of reports should provide
information about the configuration of the measurement flow (e.g.,
the sampling frequency, the sampling technique and associated
parameters, the match/mask filter, etc.). This ensures that the
measurement data are self-describing and allows the collection
system to analyze the measurement data without a separate feed of
the configuration state. Changes in configuration must be
immediately reflected in the report stream.
o Aggregate information: The reports should include sufficient
information for the collection system to account for discarded
measurement records and lost exported packets. For example, the
reports could include sequence numbers to enable the collection
machine to detect lost reports. The reports could include a count
of the number of bytes and packets that matched the filter, or that
passed both the filtering and sampling stages.
To conserve storage space and network bandwidth, the device may
compress the measurement records as they are stored or exported.
Compression should be quite effective since the sampled packets may
share many fields in common (especially if the filter focuses on
packets with certain values in particular header fields).
3.5. Configuration and Management
All configuration parameters associated with the sampling of
packets and export of measurements are to be contained in a MIB. A
secure protocol is to be used to access to the MIB for
reconfiguration and retrieval of the parameters.
4 Applications
We describe a representative set of operational applications
enabled by the passive measurement device described in the previous
section, by referring back to the examples in Section 1.
Example 1: Troubleshooting
Packet sampling is ideally suited to determine the composition of
the traffic (e.g., on a link) in terms of various attributes
(source and destination address and port numbers, prefix, protocol
number, type of service, etc.) Typically, unfiltered sampling would
be used to obtain a coarse-grained view of the traffic on a link,
say. Once the characteristics of an interesting subset of traffic
(e.g., a service type, or a source address prefix corresponding to
some customer) has been identified, the resolution can be refined
by filtering out this traffic, and by boosting the sampling rate
correspondingly. In this way, the traffic can be examined and
characterized ("sliced and diced") arbitrarily.
Example 2: Characterizing Demand
Characterizing demand for an entire network domain will likely be
achieved by sampling packets on all the ingress links, or some
other well-chosen cut set. The sampling rate would typically be
chosen relatively low, given that we are interested in averages
over longer time scales, e.g., to detect significant systemic
shifts in demand not due to random fluctuations. Some of the
subsidiary fields included in reports, such as source and
destination AS, and input and output link, will be useful,
depending on the spatial granularity of demand characterization.
Example 3: Direct Observation of Network Behavior
Direct observation of the spatial flow of traffic through the
domain can be achieved through a method called trajectory sampling,
which relies on the hash function to make sampling decisions
[DG01]. Specifically, the hash function is computed over a
predefined set of fields of the IP packet header and payload. If
the hash function for a packet falls within a configurable interval
[a,b], then the packet should be sampled; otherwise, it should not
be sampled. This features yields the full paths followed by sampled
packets, by ensuring that a packet is sampled on every router it
traverses, or no router at all. This requires that the hash
function and the set of packet fields over which it is computed are
the same everywhere.
A similar use of hash functions has also been considered for hash-
based IP traceback of distributed denial-of-service (DDoS) attacks
[SPSJTKS01].
5 References
[B88] R.T. Braden, A pseudo-machine for packet monitoring and
statistics, in Proc ACM SIGCOMM 1988
[DG01] N. G. Duffield and M. Grossglauser, Trajectory Sampling for
Direct Traffic Observation, IEEE/ACM Trans. on Networking, 9(3), pp.
280-292, June 2001.
[SPSJTKS01] A. C. Snoeren, C. Partridge, L. A. Sanchez, C. E. Jones,
F. Tchakountio, S. T. Kent, W. T. Strayer, Hash-Based IP Traceback,
Proc. ACM SIGCOMM 2001, San Diego, CA, September 2001.
6 Author's Addresses
Nicholas G. Duffield
AT&T Labs - Research
Room B-139
180 Park Ave
Florham Park NJ 07932, USA
Phone: +1 973-360-8726
Email: duffield@research.att.com
Albert Greenberg
AT&T Labs - Research
Room A-161
180 Park Ave
Florham Park NJ 07932, USA
Phone: +1 973-360-8730
Email: albert@research.att.com
Matthias Grossglauser
AT&T Labs - Research
Room A-167
180 Park Ave
Florham Park NJ 07932, USA
Phone: +1 973-360-7172
Email: mgross@research.att.com
Jennifer Rexford
AT&T Labs - Research
Room A-169
180 Park Ave
Florham Park NJ 07932, USA
Phone: +1 973-360-8728
Email: jrex@research.att.com
7 Intellectual Property Statement
AT&T Corp. may own intellectual property applicable to this
contribution. AT&T is currently reviewing its licensing intent
relative to the Intellectual Property and will notify the IETF when
AT&T has made a determination of that intent.
8 Full Copyright Statement
Copyright (C) The Internet Society (1999). All Rights Reserved.
This document and translations of it may be copied and furnished to others,
and derivative works that comment on or otherwise explain it or assist in
its implementation may be prepared, copied, published and distributed, in
whole or in part, without restriction of any kind, provided that the above
copyright notice and this paragraph are included on all such copies and
derivative works. However, this document itself may not be modified in any
way, such as by removing the copyright notice or references to the Internet
Society or other Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for copyrights
defined in the Internet Standards process must be followed, or as required
to translate it into languages other than English.
The limited permissions granted above are perpetual and will not be revoked
by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an "AS IS"
basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE
DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO
ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY
RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE.