Internet DRAFT - draft-greis-aggregation-with-pbac
draft-greis-aggregation-with-pbac
INTERNET-DRAFT Marc Greis
November 1998 Markus Albrecht
University of Bonn, Germany
Aggregation of Internet Integrated Services
State using Parameter-based Admission Control
draft-greis-aggregation-with-pbac-00.txt
Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its
areas, and its working groups. Note that other groups may also
distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
"work in progress."
To view the entire list of current Internet-Drafts, please check
the "1id-abstracts.txt" listing contained in the Internet-Drafts
Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
(Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au
(Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu
(US West Coast).
Abstract
Aggregation has been proposed as one possible solution to the
scalability problem of the Internet Integrated Services. The current
suggestions for aggregation are based on measurement-based admission
control, which allows for the omission of RSVP soft state in the
interior routers of an aggregating domain.
However, measurement-based admission control has certain flaws which
may lead to over-reservations on links in the network under certain
conditions. This can result in packet losses for reserved traffic.
Hence, we believe that it will be necessary to discuss the
possibility of using parameter-based admission control with
aggregation.
In this document, we present a technique for using parameter-based
admission control with aggregation as a basis for further
discussions, and we evaluate possible advantages and disadvantages.
Greis, Albrecht Expires 5/99 [Page 1]
INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998
1. Introduction
It has been stated in the RSVP Applicability Statement [4] that RSVP
as defined in [3] has a scalability problem, since per-flow state has
to be maintained in each node of the network for each RSVP-supported
flow traversing the node.
One of the most promising solutions for the problem seems to be the
aggregation of RSVP-supported flows. The basic idea is to let the
ingress nodes for a flow decide if a flow can be admitted when a RESV
message arrives. In [1], messages are sent from the ingress to the
egress to determine if the interior nodes in the aggregating domain
can admit the new flow. The interior nodes do not keep per-flow state
for admission control, they simply measure the amount of reserved
traffic to determine if a new flow can be admitted.
However, it is not impossible that this kind of admission control
fails under certain circumstances. Admission control failure can lead
to packet losses, which makes the results of performing reservations
with RSVP less predictable. There are two kinds of traffic which may
cause measurement-based admission control (MBAC, [2]) to fail:
- Very bursty traffic, such as video or audio traffic (especially
audio traffic from audio conferences which may be idle for long
periods).
- Traffic from sessions where resources are reserved a long time
before data is actually sent.
In both cases, the future behavior of the traffic sources can not be
predicted from past measurements. The second case seems to be less
likely, as in a real environment it would be a waste of money for a
user to reserve resources for a session a long time in advance, and
possible solutions for performing advance reservations have been
proposed. Still, a user can not be kept from reserving resources a
long time before they are being used, and in fact, it is possible
that users who want to disrupt a provider's RSVP services may exploit
this possibility.
We believe that it is necessary to consider and discuss the
possibility of using parameter-based admission control (PBAC) with
aggregation. In this document, we present a scheme based on [1] for
using PBAC with aggregation, thus enhancing the reliability of flow
aggregation at the cost of a somewhat greater overhead. However, we
will evaluate the overhead based on comparisons with 'standard' RSVP
and with MBAC-based aggregation, and we will show that the additional
overhead may be acceptable especially in smaller domains.
The rest of this document is structured as follows: In section 2, we
will present our basic idea together with an example scenario, in
section 3 we describe the necessary additions to the RSVP protocol,
Greis, Albrecht Expires 5/99 [Page 2]
INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998
in section 4 we describe our technique in more detail, and in
section 5 we evaluate the additional overhead necessary for using our
technique with aggregation.
2. The Basic Technique
The main idea behind the scheme we propose is that each router in an
aggregating domain maintains a table with the amount of aggregate
bandwidth reserved on the path to each edge router which can be
reached from this router. The information in these tables will be
gathered from the ADREQ messages that are sent by the ingress
routers of an aggregating domain to request admission control
information from the interior routers (as proposed in [1]). However,
there are several possible problems with this approach:
- A reservation request from an ADREQ message that is accepted by an
interior router may still be rejected by downstream routers.
- ADREQ messages would inform the interior routers only about
reservation requests, but not about reservation teardown.
- ADREQ messages may be lost on their way from the ingress to the
egress, in which case they will be resent later. This may cause
overreservations in interior routers which receive the same ADREQ
message twice, and use it twice to update their admission control
information.
One obvious solution to the second problem may be to let interior
routers process ResvTear messages. This would create several new
problems though, as it is possible that ResvTear messages may be
lost. It is also one of the advantages of the scheme proposed in [1]
that interior routers in an aggregating domain do not have to process
any RSVP messages except ADREQ messages.
To solve all three problems mentioned above, we propose a new message
type called ADSTAT (=ADmission control STATus) which would be sent in
certain intervals (e.g. every 30 seconds) from each router in the
aggregating domain to each adjacent router to update the admission
control information.
We also propose that admission control status information is sent
with ADREQ messages. They would contain the amount of aggregate
bandwidth reserved on the path from the router which sent the ADREQ
message to the corresponding egress router for this message. They
would be updated by each interior router with the aggregate admission
control information for the egress router as they pass through the
aggregating domain.
The admission control status information in the interior router can
be seen as a 'per-edge-router state' (as opposed to the 'per-flow
state' in RSVP), which grows larger with the amount of edge routers
Greis, Albrecht Expires 5/99 [Page 3]
INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998
in an aggregating domain, which limits this scheme to 'small'
domains, though possible values for the amount of edge routers will
be discussed in section 5. It should be noted that the admission
control state in the routers has to be a soft state. If it is not
refreshed by ADSTAT or ADREQ messages with status information, it
will expire. This is necessary to avoid the 'survival' of out-dated
status information after routing changes. It will be left to future
research and discussions to determine how long an admission control
status should be kept before it expires.
It will be necessary to avoid sending redundant status information to
save bandwidth. Status information should only be sent when it
differs from the last status information that was sent with an ADSTAT
or ADREQ message. It should be kept in mind though that the admission
control status in the interior routers needs to be refreshed
periodically. But sending hundreds of ADREQ messages with exactly the
same status information within a few seconds should be avoided. This
could happen on interior routers if there is no more free bandwidth
on an interface for an interior router, so new reservations are
rejected and the admission control status does not change, but ADREQ
messages may still pass through.
It is also possible to only send status information with ADREQ
messages periodically (as with ADSTAT messages). It should be kept in
mind though that this makes the technique more 'conservative', as the
information in interior routers may be out-dated, and they may reject
new reservations based on this old information. Further research will
be necessary to determine useful values for this and other possible
parameters.
Other issues which will also have to be considered in the future are
multicast sessions where reservations from different branches of the
multicast tree can be merged, and shared reservation styles. It may
be necessary to use common RSVP in these cases or to use aggregation
with MBAC.
2.1. A Simple Example
The example in this section will be used to illustrate the technique
we propose. Figure 1 shows a sample domain with 5 routers. The
routers R1, R4 and R5 are edge routers, R2 and R3 are interior
routers.
An example for the admission control status information in the
routers is shown in figure 2. Integer numbers were chosen to
represent resources. For example, router R1 has reserved an amount of
11 'resource units' towards edge router R4 and 21 towards edge router
Greis, Albrecht Expires 5/99 [Page 4]
INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998
R5. No other entries are necessary for this router, as no other edge
routers are present in the domain, and all traffic from R1 going into
the domain will leave the domain either through R4 or R5.
| |
--[R1]--------(R2)--------(R3)--------[R4]--
| | |
|
--[R5]--
|
Figure 1: A Sample Domain
__________ _______
| R2 | | R3 |
____ |-+--+--+--| |-+--+--| ____
| R1 | | | a| b| c| | | a| b| | R4 |
|-+--| (a)|-|--|--|--|(c) (a)|-|--|--|(b) |-+--|
|4|11|----------|1| /|17|31|----------|1| /|31|----------|1|31|
|5|21| |4|11|14| /| |4|25| /| |5|22|
|_|__| |5|21| /|22| |5| /|22| |_|__|
|_|__|__|__| |_|__|__|
|(b)
|
_|__
| R5 |
|-+--|
|1|17|
|4|14|
|_|__|
Figure 2: The Admission Control Tables for the Example Topology
The admission control status for R2 is more complicated. It is not
only necessary to store the amount of reserved bandwidth on the path
towards an edge router, but also through which interface this
information was received. For example R2 received the information
from edge router R1 about the reservations for edge router R4 and R5
through interface (a).
To determine how much bandwidth is actually reserved on an outgoing
interface, the sum of all entries in the table for all edge routers
the interface routes to has to be calculated. For example, interface
(c) on R2 routes only to one edge router, to R4. Hence, the amount of
reserved traffic on interface (c) can be calculated as the sum of all
Greis, Albrecht Expires 5/99 [Page 5]
INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998
entries for R4, in this case 25 (11 from interface (a) and 14 from
interface (b)). The method used to calculate the sum is usually
service-specific (e.g. for Controlled-Load or Guaranteed Service) and
should be described in the documents defining the service.
It will be left to the underlying routing protocol and to periodical
routing lookups to determine which interface routes to which edge
router.
When R2 sends an ADSTAT message to R3, it will only include the
information for the edge routers which interface (c) (the outgoing
interface for the ADSTAT message) routes to. In this case it is only
the 25 for R4, while R3 sends a 31 for R1 and a 22 for R5 to R2.
In this example, only one service class was used. It is possible to
use the scheme proposed here with several service classes. It will
simply be necessary to keep separate admission control status
information for each service class and to send separate ADSTAT
messages.
3. RSVP Extensions
The most important question, which has to be answered before a format
for ADSTAT messages and for the necessary extensions to the ADREQ
messages can be specified, is how the bandwidth on the path to an
edge router can be represented. This information would usually be
service-specific, so it would be useful to use the FLOWSPEC object as
defined for Controlled-Load and Guaranteed Service (in [5]). It may
be argued that some of the information in the FLOWSPEC objects may
not be necessary for this purpose and that a new smaller object
should be defined to save bandwidth, but using the FLOWSPEC object
seems to allow for the highest flexibility.
The format for ADSTAT messages is as follows:
<ADSTAT> ::= <Common Header> <AC status list>
<AC status list> ::= <RSVP_HOP> <FLOWSPEC> |
<RSVP_HOP> <FLOWSPEC> <AC status list>
Each RSVP_HOP/FLOWSPEC pair describes the bandwidth reserved on the
path to an edge router. The RSVP_HOP object contains the address of
the edge router which the pair corresponds to. An RSVP object
RSVP_EDGE which would only contain the address of an edge router may
be defined for to replace the RSVP_HOP object here, but for now this
seems like an unnecessary redundancy.
Greis, Albrecht Expires 5/99 [Page 6]
INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998
An ADREQ message carrying admission control status information would
only contain one such RSVP_HOP/FLOWSPEC pair, or in fact, the
RSVP_HOP object can be omitted, since the edge router the pair would
correspond to is already known: The egress router the ADREQ message
is being sent to. However, adding the status information to the ADREQ
messages may cause confusion, as they already contain a FLOWSPEC
object. The FLOWSPEC object containing the admission control status
information should always be the second FLOWSPEC object in an ADREQ
message. This means that interior routers can determine if an ADREQ
message carries admission control status information simply by
checking if it contains a second FLOWSPEC object.
4. Detailed Algorithms
There are three important events which can occur at an interior
router in an aggregating domain with PBAC:
- An ADSTAT message has to be sent
- An ADSTAT message is received from an adjacent router
- An ADREQ message is received from an adjacent router
In this section, we will describe the actions to be taken when these
events occur in more detail as a basis for possible implementations.
Send ADSTAT message:
- For each interface on the router sending the message:
- Create a new ADSTAT message
- For each edge router the interface routes to:
- Add an RSVP_HOP object with the edge router's address to
the ADSTAT message
- Calculate the sum of all reservations to this edge router
- Add a FLOWSPEC object containing the calculated
reservation to the message
- Send the ADSTAT message
- Set a timer to send a new ADSTAT message after a certain time
As will be discussed in the next section, it should be kept in mind
that an ADSTAT message does not have to contain admission control
information for all edge routers. It is possible to modify the above
algorithm so that information for a certain subset of the edge
routers is sent.
ADSTAT message received on the incoming interface (i):
- For each RSVP_HOP/FLOWSPEC pair in the message:
- Is there already an entry for the edge router with the
address stored in the RSVP_HOP object?
- No: Create a new entry for this edge router
- Modify the entry for this edge router: Replace the old entry
for (i) with the data from the FLOWSPEC object
- Reset the expiration timer for the modified entry
Greis, Albrecht Expires 5/99 [Page 7]
INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998
It can be seen from this (and the next) algorithm how the admission
control status tables are built dynamically from the received ADSTAT
and ADREQ messages. It will not be necessary to configure the
interior routers with the addresses of all edge routers in advance.
ADREQ message received on an interior router on the incoming
interface (i):
- Determine which edge router the message is being sent to and
store the address in e_addr
- Is there already an admission control status entry for e_addr?
- No: Create a new entry for this edge router
- Reset the expiration timer for the admission control status
entry for e_addr and (i)
- Does the ADREQ message contain admission control status
information?
- Yes: Replace the old admission control status information for
e_addr and (i) with the information from the ADREQ message
and remove the information from the message
- Calculate the sum of all reservations for e_addr and store it in
the FLOWSPEC object old_status
- Determine the outgoing interface (o) for the ADREQ message
- Calculate the sum of all reservations for all edge routers (o)
routes to and store it in the FLOWSPEC object res_sum
- Use res_sum to decide if the flow which corresponds to the ADREQ
message can be admitted.
- Was admission control successful?
- Yes: Modify the admission control status information for
e_addr and (i) by adding the FLOWSPEC from the ADREQ message
- Determine if admission control status information should be sent
with the forwarded ADREQ message
- If yes: Add the FLOWSPEC object old_status to the message
- Forward the modified ADREQ message towards e_addr
In this algorithm the expiration timer for e_addr is always reset,
even if the ADREQ message does not contain admission control status
information, because the fact that an ADREQ message for e_addr was
received shows that the router is still on the route to e_addr, while
the purpose of the expiration timer is to let out-dated admission
control status expire after routing changes.
The decision if admission control status information should be sent
with an ADREQ message is based on a set of rules as mentioned in
section 2. The only case when status information HAS to be sent with
an ADREQ message is when an edge router sends an ADREQ message for
the same session twice (i.e. when the first was probably lost, since
no corresponding ADREP message was received). Otherwise, the
bandwidth for the same reservation might be added to the same entry
twice. This also means that it will probably be necessary for
Greis, Albrecht Expires 5/99 [Page 8]
INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998
interior routers to send status information with forwarded ADREQ
messages when the ADREQ message that was received contained status
information, or else important information would be lost.
5. Evaluation
The overhead necessary for deploying Integrated Services in a network
can be split up in three categories:
- Classifier and scheduler state
- Setup protocol state
- Setup protocol messages
The state in the classifier and scheduler is the most important
potential problem, since each packet has to pass through these two
elements. The first and foremost goal of all aggregation schemes
would be to reduce the size of the classifier and scheduler state.
The size of the setup protocol state is less important, but can still
consume a huge amount of memory with lists that have to be
maintained, and CPU time which is needed to maintain these lists.
The overhead created by setup protocol messages can be a problem in
two ways: The routers have to create and send outgoing messages and
they have to process and forward incoming messages, both of which
means additional CPU load for the router. But the messages may also
consume a considerable amount of bandwidth if they are too big, or if
they are sent too often.
In table 1, we evaluate and compare the classifier/scheduler state
size, the setup protocol state size and the message overhead for
'classic' RSVP, for aggregation with measurement-based admission
control and for aggregation with parameter-based admission control.
It has to be kept in mind that one important factor does not appear
in table 1: The possible additional overhead for measuring the amount
of traffic for each service class when using aggregation with MBAC.
More experience with MBAC algorithms will be needed to determine the
importance of this problem, as it will depend to a large extent on
the router's capability to perform such measurements.
It can be seen from table 1 that for the classifier and scheduler
state, aggregation with PBAC has the same advantages over RSVP as
aggregation with MBAC, which means that RSVP's most central
scalability problem is still solved when aggregation with PBAC is
used.
The setup protocol state size for PBAC aggregation can be seen to
be 'between' the protocol state size for RSVP (where the protocol
state can become very large in extreme cases) and for MBAC
aggregation (where no protocol state is kept at all on interior
Greis, Albrecht Expires 5/99 [Page 9]
INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998
routers) for 'small' domains, that means in domains where the result
of #I*#E*#S is significantly smaller than the highest expected number
of flows. The entries in the admission control status table would not
be too big, so it may be possible to maintain admission control
status for 1000 or more edge routers.
| RSVP | Aggr. with MBAC | Aggr. with PBAC |
-----------+------------------+------------------+------------------|
Classifier/|Per-Flow |Fixed state. Size |Fixed state. Size |
Scheduler |Limited only by |based on the |based on the |
State Size |Si/r (*) |number of service |number of service |
| |classes. |classes. |
-----------+------------------+------------------+------------------|
|Per-Flow |Full RSVP state on|Full RSVP state on|
| |edge routers, no |edge routers, no |
| |state on interior |RSVP state on |
Protocol | |routers. |interior routers. |
State Size | | |Admission control |
| | |state on all |
| | |routers. Limited |
| | |by #I*#E*#S (*) |
-----------+------------------+------------------+------------------|
|RSVP messages have|New messages: |New messages: |
|to be sent and |ADREQ and ADREP. |ADREQ, ADREP and |
Message |processed by all |RSVP messages are |ADSTAT. RSVP |
Overhead |routers for all |still sent, but |messages are still|
|flows. |interior routers |sent, but interior|
| |only send and |routers only send |
| |process ADREQ |and process ADREQ |
| |messages. |and ADSTAT. |
---------------------------------------------------------------------
Table 1: A Comparison of the Overhead for RSVP and Aggregation
(*) Explanation of the symbols used in table 1:
Si - The sum of the reservable bandwidth on all interfaces
r - The smallest possible reservation
#I - The number of interfaces on a router
#E - The number of edge routers in a domain
#S - The number of service classes
The biggest limitation for aggregation with PBAC is the message
overhead. The ADSTAT messages can grow fairly big. The FLOWSPEC
object for Controlled-Load Service is 72 bytes long, an RSVP_HOP
object for IPv4 is 12 bytes long. That means that the admission
control status information in an ADSTAT message for 100 edge routers
Greis, Albrecht Expires 5/99 [Page 10]
INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998
would be 8400 bytes long. However, ADSTAT messages could be split up.
A router does not have to send the information for all edge routers,
but it could send the information in small portions, which would
require only a small modification to the algorithm presented in the
last section. We believe that the message overhead created by
aggregation with parameter-based admission control would be
acceptable for domains with a few hundred edge routers. Future
research based on network simulations will be necessary to make more
exact statements.
6. Security Considerations
Security considerations have not been addressed in [1], and they will
also not be addressed in this draft. However, it is important to
understand that it will be necessary to develop security mechanisms
in the future to protect the network especially from corrupted or
spoofed ADSTAT messages.
7. Conclusion
We have described both the basic idea and several details of a
possible scheme for using parameter-based admission control with
aggregation for RSVP. In our opinion this adds flexibility to
aggregation, especially in smaller domain where the additional
overhead is smaller due to a small number of edge routers.
At the cost of a somewhat higher overhead (depending on the number of
edge routers) as compared to aggregation with MBAC, our scheme gives
reservations in aggregating domains the reliability which they would
otherwise only receive with RSVP and full per-flow state on all
routers.
Our technique is not meant to replace aggregation techniques with
MBAC, but to allow for greater flexibility. In fact, MBAC and PBAC
could coexist in aggregating regions. MBAC could be used for general
predictable traffic, while PBAC can be used for traffic with
characteristics which are likely to 'disturb' MBAC, like bursty audio
or video traffic.
Future research and discussions are needed to resolve open issues,
especially how multicast sessions and shared reservation styles may
fit into our scheme.
Greis, Albrecht Expires 5/99 [Page 11]
INTERNET-DRAFT draft-greis-aggregation-with-pbac-00.txt November 1998
8. References
[1] Berson, S., Vincent, S., "Aggregation of Internet Integrated
Services State", (draft-berson-rsvp-aggregation-00), Internet
Draft (work in progress), August 1998
[2] Jamin, S., Shenker, S., Danzig, P., "Comparison of Measurement-
based Admission Control Algorithms for Controlled Load Service",
Infocom 1997
[3] Braden, R., Zhang, L., Berson, S., Herzog, S., Jamin, S.,
"Resource ReSerVation Protocol (RSVP) -- Version 1 Functional
Specification", RFC (Request for Comments) 2205, September 1997
[4] Mankin, A. et al, "Resource ReSerVation Protocol (RSVP) Version
1 Applicability Statement - Some Guidelines on Deployment", RFC
(Request for Comments) 2208, September 1997
[5] Wroclawski, J., "The Use of RSVP with IETF Integrated Services",
RFC (Request for Comments) 2210, September 1997
Author's Address
Marc Greis
University of Bonn
Institute of Computer Science IV
Roemerstr. 164
53117 Bonn
Germany
Email: greis@cs.uni-bonn.de
Markus Albrecht
University of Bonn
Institute of Computer Science IV
Roemerstr. 164
53117 Bonn
Germany
Email: sukram@cs.uni-bonn.de
Greis, Albrecht Expires 5/99 [Page 12]