Internet DRAFT - draft-greis-aggregation-with-pbac

draft-greis-aggregation-with-pbac




INTERNET-DRAFT                                                Marc Greis
November 1998                                            Markus Albrecht
                                             University of Bonn, Germany



             Aggregation of Internet Integrated Services
            State using Parameter-based Admission Control
              draft-greis-aggregation-with-pbac-00.txt


Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its
   areas, and its working groups.  Note that other groups may also
   distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other
   documents at any time.  It is inappropriate to use Internet-
   Drafts as reference material or to cite them other than as
   "work in progress."

   To view the entire list of current Internet-Drafts, please check
   the "1id-abstracts.txt" listing contained in the Internet-Drafts
   Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
   (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au
   (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu
   (US West Coast).


Abstract

   Aggregation has been proposed as one possible solution to the
   scalability problem of the Internet Integrated Services. The current
   suggestions for aggregation are based on measurement-based admission
   control, which allows for the omission of RSVP soft state in the
   interior routers of an aggregating domain.

   However, measurement-based admission control has certain flaws which
   may lead to over-reservations on links in the network under certain
   conditions. This can result in packet losses for reserved traffic.
   Hence, we believe that it will be necessary to discuss the
   possibility of using parameter-based admission control with
   aggregation.

   In this document, we present a technique for using parameter-based
   admission control with aggregation as a basis for further
   discussions, and we evaluate possible advantages and disadvantages.




Greis, Albrecht             Expires 5/99                       [Page 1]

INTERNET-DRAFT   draft-greis-aggregation-with-pbac-00.txt  November 1998


1. Introduction
   
   It has been stated in the RSVP Applicability Statement [4] that RSVP
   as defined in [3] has a scalability problem, since per-flow state has
   to be maintained in each node of the network for each RSVP-supported
   flow traversing the node.

   One of the most promising solutions for the problem seems to be the
   aggregation of RSVP-supported flows. The basic idea is to let the
   ingress nodes for a flow decide if a flow can be admitted when a RESV
   message arrives. In [1], messages are sent from the ingress to the
   egress to determine if the interior nodes in the aggregating domain
   can admit the new flow. The interior nodes do not keep per-flow state
   for admission control, they simply measure the amount of reserved
   traffic to determine if a new flow can be admitted.

   However, it is not impossible that this kind of admission control
   fails under certain circumstances. Admission control failure can lead
   to packet losses, which makes the results of performing reservations
   with RSVP less predictable. There are two kinds of traffic which may
   cause measurement-based admission control (MBAC, [2]) to fail:
   - Very bursty traffic, such as video or audio traffic (especially
     audio traffic from audio conferences which may be idle for long
     periods).
   - Traffic from sessions where resources are reserved a long time
     before data is actually sent.
   In both cases, the future behavior of the traffic sources can not be
   predicted from past measurements. The second case seems to be less
   likely, as in a real environment it would be a waste of money for a
   user to reserve resources for a session a long time in advance, and
   possible solutions for performing advance reservations have been
   proposed. Still, a user can not be kept from reserving resources a
   long time before they are being used, and in fact, it is possible
   that users who want to disrupt a provider's RSVP services may exploit
   this possibility.

   We believe that it is necessary to consider and discuss the
   possibility of using parameter-based admission control (PBAC) with
   aggregation. In this document, we present a scheme based on [1] for
   using PBAC with aggregation, thus enhancing the reliability of flow
   aggregation at the cost of a somewhat greater overhead. However, we
   will evaluate the overhead based on comparisons with 'standard' RSVP
   and with MBAC-based aggregation, and we will show that the additional
   overhead may be acceptable especially in smaller domains.
   
   The rest of this document is structured as follows: In section 2, we
   will present our basic idea together with an example scenario, in
   section 3 we describe the necessary additions to the RSVP protocol,



Greis, Albrecht             Expires 5/99                       [Page 2]

INTERNET-DRAFT   draft-greis-aggregation-with-pbac-00.txt  November 1998


   in section 4 we describe our technique in more detail, and in
   section 5 we evaluate the additional overhead necessary for using our
   technique with aggregation.


2. The Basic Technique

   The main idea behind the scheme we propose is that each router in an
   aggregating domain maintains a table with the amount of aggregate 
   bandwidth reserved on the path to each edge router which can be 
   reached from this router. The information in these tables will be 
   gathered from the ADREQ messages that are sent by the ingress 
   routers of an aggregating domain to request admission control 
   information from the interior routers (as proposed in [1]). However,
   there are several possible problems with this approach:
   - A reservation request from an ADREQ message that is accepted by an
     interior router may still be rejected by downstream routers.
   - ADREQ messages would inform the interior routers only about
     reservation requests, but not about reservation teardown.
   - ADREQ messages may be lost on their way from the ingress to the
     egress, in which case they will be resent later. This may cause
     overreservations in interior routers which receive the same ADREQ
     message twice, and use it twice to update their admission control
     information.
   One obvious solution to the second problem may be to let interior
   routers process ResvTear messages. This would create several new
   problems though, as it is possible that ResvTear messages may be
   lost. It is also one of the advantages of the scheme proposed in [1]
   that interior routers in an aggregating domain do not have to process
   any RSVP messages except ADREQ messages.

   To solve all three problems mentioned above, we propose a new message
   type called ADSTAT (=ADmission control STATus) which would be sent in
   certain intervals (e.g. every 30 seconds) from each router in the
   aggregating domain to each adjacent router to update the admission
   control information.

   We also propose that admission control status information is sent
   with ADREQ messages. They would contain the amount of aggregate
   bandwidth reserved on the path from the router which sent the ADREQ
   message to the corresponding egress router for this message. They
   would be updated by each interior router with the aggregate admission
   control information for the egress router as they pass through the
   aggregating domain.

   The admission control status information in the interior router can
   be seen as a 'per-edge-router state' (as opposed to the 'per-flow
   state' in RSVP), which grows larger with the amount of edge routers



Greis, Albrecht              Expires 5/99                       [Page 3]

INTERNET-DRAFT   draft-greis-aggregation-with-pbac-00.txt  November 1998


   in an aggregating domain, which limits this scheme to 'small'
   domains, though possible values for the amount of edge routers will
   be discussed in section 5. It should be noted that the admission
   control state in the routers has to be a soft state. If it is not
   refreshed by ADSTAT or ADREQ messages with status information, it
   will expire. This is necessary to avoid the 'survival' of out-dated
   status information after routing changes. It will be left to future
   research and discussions to determine how long an admission control
   status should be kept before it expires.

   It will be necessary to avoid sending redundant status information to
   save bandwidth. Status information should only be sent when it
   differs from the last status information that was sent with an ADSTAT
   or ADREQ message. It should be kept in mind though that the admission
   control status in the interior routers needs to be refreshed
   periodically. But sending hundreds of ADREQ messages with exactly the
   same status information within a few seconds should be avoided. This
   could happen on interior routers if there is no more free bandwidth
   on an interface for an interior router, so new reservations are
   rejected and the admission control status does not change, but ADREQ
   messages may still pass through.

   It is also possible to only send status information with ADREQ
   messages periodically (as with ADSTAT messages). It should be kept in
   mind though that this makes the technique more 'conservative', as the
   information in interior routers may be out-dated, and they may reject
   new reservations based on this old information. Further research will
   be necessary to determine useful values for this and other possible
   parameters.
   
   Other issues which will also have to be considered in the future are
   multicast sessions where reservations from different branches of the
   multicast tree can be merged, and shared reservation styles. It may
   be necessary to use common RSVP in these cases or to use aggregation
   with MBAC.


2.1. A Simple Example

   The example in this section will be used to illustrate the technique
   we propose. Figure 1 shows a sample domain with 5 routers. The
   routers R1, R4 and R5 are edge routers, R2 and R3 are interior
   routers.

   An example for the admission control status information in the
   routers is shown in figure 2. Integer numbers were chosen to
   represent resources. For example, router R1 has reserved an amount of
   11 'resource units' towards edge router R4 and 21 towards edge router



Greis, Albrecht              Expires 5/99                       [Page 4]

INTERNET-DRAFT   draft-greis-aggregation-with-pbac-00.txt  November 1998


   R5. No other entries are necessary for this router, as no other edge
   routers are present in the domain, and all traffic from R1 going into
   the domain will leave the domain either through R4 or R5.


               |                                   |
            --[R1]--------(R2)--------(R3)--------[R4]--
               |           |                       |
                           |
                        --[R5]--
                           |

                       Figure 1: A Sample Domain


                    __________            _______
                   |    R2    |          |   R3  |
    ____           |-+--+--+--|          |-+--+--|           ____
   | R1 |          | | a| b| c|          | | a| b|          | R4 |
   |-+--|       (a)|-|--|--|--|(c)    (a)|-|--|--|(b)       |-+--|
   |4|11|----------|1| /|17|31|----------|1| /|31|----------|1|31|
   |5|21|          |4|11|14| /|          |4|25| /|          |5|22|
   |_|__|          |5|21| /|22|          |5| /|22|          |_|__|
                   |_|__|__|__|          |_|__|__|
                         |(b)
                         |
                        _|__ 
                       | R5 |
                       |-+--|
                       |1|17|
                       |4|14|
                       |_|__|

    Figure 2: The Admission Control Tables for the Example Topology


   The admission control status for R2 is more complicated. It is not
   only necessary to store the amount of reserved bandwidth on the path
   towards an edge router, but also through which interface this
   information was received. For example R2 received the information
   from edge router R1 about the reservations for edge router R4 and R5
   through interface (a).
   
   To determine how much bandwidth is actually reserved on an outgoing
   interface, the sum of all entries in the table for all edge routers
   the interface routes to has to be calculated. For example, interface
   (c) on R2 routes only to one edge router, to R4. Hence, the amount of
   reserved traffic on interface (c) can be calculated as the sum of all



Greis, Albrecht              Expires 5/99                       [Page 5]

INTERNET-DRAFT   draft-greis-aggregation-with-pbac-00.txt  November 1998


   entries for R4, in this case 25 (11 from interface (a) and 14 from
   interface (b)). The method used to calculate the sum is usually 
   service-specific (e.g. for Controlled-Load or Guaranteed Service) and
   should be described in the documents defining the service.

   It will be left to the underlying routing protocol and to periodical
   routing lookups to determine which interface routes to which edge
   router.

   When R2 sends an ADSTAT message to R3, it will only include the
   information for the edge routers which interface (c) (the outgoing
   interface for the ADSTAT message) routes to. In this case it is only
   the 25 for R4, while R3 sends a 31 for R1 and a 22 for R5 to R2.

   In this example, only one service class was used. It is possible to
   use the scheme proposed here with several service classes. It will
   simply be necessary to keep separate admission control status
   information for each service class and to send separate ADSTAT
   messages.


3. RSVP Extensions

   The most important question, which has to be answered before a format
   for ADSTAT messages and for the necessary extensions to the ADREQ
   messages can be specified, is how the bandwidth on the path to an
   edge router can be represented. This information would usually be
   service-specific, so it would be useful to use the FLOWSPEC object as
   defined for Controlled-Load and Guaranteed Service (in [5]). It may
   be argued that some of the information in the FLOWSPEC objects may
   not be necessary for this purpose and that a new smaller object
   should be defined to save bandwidth, but using the FLOWSPEC object
   seems to allow for the highest flexibility.

   The format for ADSTAT messages is as follows:
   
   <ADSTAT> ::= <Common Header> <AC status list>

   <AC status list> ::= <RSVP_HOP> <FLOWSPEC> |
                        <RSVP_HOP> <FLOWSPEC> <AC status list>

   Each RSVP_HOP/FLOWSPEC pair describes the bandwidth reserved on the
   path to an edge router. The RSVP_HOP object contains the address of
   the edge router which the pair corresponds to. An RSVP object
   RSVP_EDGE which would only contain the address of an edge router may
   be defined for to replace the RSVP_HOP object here, but for now this
   seems like an unnecessary redundancy.




Greis, Albrecht              Expires 5/99                       [Page 6]

INTERNET-DRAFT   draft-greis-aggregation-with-pbac-00.txt  November 1998


   An ADREQ message carrying admission control status information would
   only contain one such RSVP_HOP/FLOWSPEC pair, or in fact, the
   RSVP_HOP object can be omitted, since the edge router the pair would
   correspond to is already known: The egress router the ADREQ message
   is being sent to. However, adding the status information to the ADREQ
   messages may cause confusion, as they already contain a FLOWSPEC
   object. The FLOWSPEC object containing the admission control status
   information should always be the second FLOWSPEC object in an ADREQ
   message. This means that interior routers can determine if an ADREQ
   message carries admission control status information simply by
   checking if it contains a second FLOWSPEC object.


4. Detailed Algorithms

   There are three important events which can occur at an interior
   router in an aggregating domain with PBAC:
   - An ADSTAT message has to be sent
   - An ADSTAT message is received from an adjacent router
   - An ADREQ message is received from an adjacent router
   In this section, we will describe the actions to be taken when these
   events occur in more detail as a basis for possible implementations.

   Send ADSTAT message:
      - For each interface on the router sending the message:
         - Create a new ADSTAT message
         - For each edge router the interface routes to:
            - Add an RSVP_HOP object with the edge router's address to
              the ADSTAT message
            - Calculate the sum of all reservations to this edge router
            - Add a FLOWSPEC object containing the calculated
              reservation to the message
         - Send the ADSTAT message
      - Set a timer to send a new ADSTAT message after a certain time

   As will be discussed in the next section, it should be kept in mind
   that an ADSTAT message does not have to contain admission control
   information for all edge routers. It is possible to modify the above
   algorithm so that information for a certain subset of the edge
   routers is sent.

   ADSTAT message received on the incoming interface (i):
      - For each RSVP_HOP/FLOWSPEC pair in the message:
         - Is there already an entry for the edge router with the
           address stored in the RSVP_HOP object?
            - No: Create a new entry for this edge router
         - Modify the entry for this edge router: Replace the old entry
           for (i) with the data from the FLOWSPEC object
         - Reset the expiration timer for the modified entry


Greis, Albrecht              Expires 5/99                       [Page 7]

INTERNET-DRAFT   draft-greis-aggregation-with-pbac-00.txt  November 1998


   It can be seen from this (and the next) algorithm how the admission
   control status tables are built dynamically from the received ADSTAT
   and ADREQ messages. It will not be necessary to configure the
   interior routers with the addresses of all edge routers in advance.

   ADREQ message received on an interior router on the incoming
   interface (i):
      - Determine which edge router the message is being sent to and
        store the address in e_addr
      - Is there already an admission control status entry for e_addr?
         - No: Create a new entry for this edge router
      - Reset the expiration timer for the admission control status
        entry for e_addr and (i)
      - Does the ADREQ message contain admission control status
        information?
         - Yes: Replace the old admission control status information for
           e_addr and (i) with the information from the ADREQ message
           and remove the information from the message
      - Calculate the sum of all reservations for e_addr and store it in
        the FLOWSPEC object old_status
      - Determine the outgoing interface (o) for the ADREQ message
      - Calculate the sum of all reservations for all edge routers (o)
        routes to and store it in the FLOWSPEC object res_sum
      - Use res_sum to decide if the flow which corresponds to the ADREQ
        message can be admitted.
      - Was admission control successful?
         - Yes: Modify the admission control status information for
           e_addr and (i) by adding the FLOWSPEC from the ADREQ message
      - Determine if admission control status information should be sent
        with the forwarded ADREQ message
         - If yes: Add the FLOWSPEC object old_status to the message
      - Forward the modified ADREQ message towards e_addr

   In this algorithm the expiration timer for e_addr is always reset,
   even if the ADREQ message does not contain admission control status
   information, because the fact that an ADREQ message for e_addr was
   received shows that the router is still on the route to e_addr, while
   the purpose of the expiration timer is to let out-dated admission
   control status expire after routing changes.

   The decision if admission control status information should be sent
   with an ADREQ message is based on a set of rules as mentioned in
   section 2. The only case when status information HAS to be sent with
   an ADREQ message is when an edge router sends an ADREQ message for
   the same session twice (i.e. when the first was probably lost, since
   no corresponding ADREP message was received). Otherwise, the
   bandwidth for the same reservation might be added to the same entry
   twice. This also means that it will probably be necessary for



Greis, Albrecht              Expires 5/99                       [Page 8]

INTERNET-DRAFT   draft-greis-aggregation-with-pbac-00.txt  November 1998


   interior routers to send status information with forwarded ADREQ
   messages when the ADREQ message that was received contained status
   information, or else important information would be lost.


5. Evaluation

   The overhead necessary for deploying Integrated Services in a network
   can be split up in three categories:
   - Classifier and scheduler state
   - Setup protocol state
   - Setup protocol messages 
   The state in the classifier and scheduler is the most important
   potential problem, since each packet has to pass through these two
   elements. The first and foremost goal of all aggregation schemes
   would be to reduce the size of the classifier and scheduler state.
   The size of the setup protocol state is less important, but can still
   consume a huge amount of memory with lists that have to be
   maintained, and CPU time which is needed to maintain these lists.
   The overhead created by setup protocol messages can be a problem in
   two ways: The routers have to create and send outgoing messages and
   they have to process and forward incoming messages, both of which
   means additional CPU load for the router. But the messages may also
   consume a considerable amount of bandwidth if they are too big, or if
   they are sent too often.

   In table 1, we evaluate and compare the classifier/scheduler state
   size, the setup protocol state size and the message overhead for
   'classic' RSVP, for aggregation with measurement-based admission
   control and for aggregation with parameter-based admission control.

   It has to be kept in mind that one important factor does not appear
   in table 1: The possible additional overhead for measuring the amount
   of traffic for each service class when using aggregation with MBAC.
   More experience with MBAC algorithms will be needed to determine the
   importance of this problem, as it will depend to a large extent on
   the router's capability to perform such measurements.

   It can be seen from table 1 that for the classifier and scheduler
   state, aggregation with PBAC has the same advantages over RSVP as
   aggregation with MBAC, which means that RSVP's most central
   scalability problem is still solved when aggregation with PBAC is
   used.

   The setup protocol state size for PBAC aggregation can be seen to
   be 'between' the protocol state size for RSVP (where the protocol
   state can become very large in extreme cases) and for MBAC
   aggregation (where no protocol state is kept at all on interior



Greis, Albrecht              Expires 5/99                       [Page 9]

INTERNET-DRAFT   draft-greis-aggregation-with-pbac-00.txt  November 1998


   routers) for 'small' domains, that means in domains where the result
   of #I*#E*#S is significantly smaller than the highest expected number
   of flows. The entries in the admission control status table would not
   be too big, so it may be possible to maintain admission control
   status for 1000 or more edge routers.


              |       RSVP       | Aggr. with MBAC  | Aggr. with PBAC  |
   -----------+------------------+------------------+------------------|
   Classifier/|Per-Flow          |Fixed state. Size |Fixed state. Size |
   Scheduler  |Limited only by   |based on the      |based on the      |
   State Size |Si/r  (*)         |number of service |number of service |
              |                  |classes.          |classes.          |
   -----------+------------------+------------------+------------------|
              |Per-Flow          |Full RSVP state on|Full RSVP state on|
              |                  |edge routers, no  |edge routers, no  |
              |                  |state on interior |RSVP state on     |
   Protocol   |                  |routers.          |interior routers. |
   State Size |                  |                  |Admission control |
              |                  |                  |state on all      |
              |                  |                  |routers. Limited  |
              |                  |                  |by #I*#E*#S (*)   |
   -----------+------------------+------------------+------------------|
              |RSVP messages have|New messages:     |New messages:     |
              |to be sent and    |ADREQ and ADREP.  |ADREQ, ADREP and  |
    Message   |processed by all  |RSVP messages are |ADSTAT. RSVP      |
    Overhead  |routers for all   |still sent, but   |messages are still|
              |flows.            |interior routers  |sent, but interior|
              |                  |only send and     |routers only send |
              |                  |process ADREQ     |and process ADREQ |
              |                  |messages.         |and ADSTAT.       |
   ---------------------------------------------------------------------

       Table 1: A Comparison of the Overhead for RSVP and Aggregation


   (*) Explanation of the symbols used in table 1:
   Si  -  The sum of the reservable bandwidth on all interfaces
   r   -  The smallest possible reservation
   #I  -  The number of interfaces on a router
   #E  -  The number of edge routers in a domain
   #S  -  The number of service classes

   The biggest limitation for aggregation with PBAC is the message
   overhead. The ADSTAT messages can grow fairly big. The FLOWSPEC
   object for Controlled-Load Service is 72 bytes long, an RSVP_HOP
   object for IPv4 is 12 bytes long. That means that the admission
   control status information in an ADSTAT message for 100 edge routers



Greis, Albrecht              Expires 5/99                      [Page 10]

INTERNET-DRAFT   draft-greis-aggregation-with-pbac-00.txt  November 1998


   would be 8400 bytes long. However, ADSTAT messages could be split up.
   A router does not have to send the information for all edge routers,
   but it could send the information in small portions, which would
   require only a small modification to the algorithm presented in the
   last section. We believe that the message overhead created by
   aggregation with parameter-based admission control would be
   acceptable for domains with a few hundred edge routers. Future
   research based on network simulations will be necessary to make more
   exact statements.


   6. Security Considerations

   Security considerations have not been addressed in [1], and they will
   also not be addressed in this draft. However, it is important to
   understand that it will be necessary to develop security mechanisms
   in the future to protect the network especially from corrupted or
   spoofed ADSTAT messages.


   7. Conclusion

   We have described both the basic idea and several details of a
   possible scheme for using parameter-based admission control with
   aggregation for RSVP. In our opinion this adds flexibility to
   aggregation, especially in smaller domain where the additional
   overhead is smaller due to a small number of edge routers.

   At the cost of a somewhat higher overhead (depending on the number of
   edge routers) as compared to aggregation with MBAC, our scheme gives
   reservations in aggregating domains the reliability which they would
   otherwise only receive with RSVP and full per-flow state on all
   routers.

   Our technique is not meant to replace aggregation techniques with
   MBAC, but to allow for greater flexibility. In fact, MBAC and PBAC
   could coexist in aggregating regions. MBAC could be used for general
   predictable traffic, while PBAC can be used for traffic with
   characteristics which are likely to 'disturb' MBAC, like bursty audio
   or video traffic.

   Future research and discussions are needed to resolve open issues,
   especially how multicast sessions and shared reservation styles may
   fit into our scheme.







Greis, Albrecht              Expires 5/99                      [Page 11]

INTERNET-DRAFT   draft-greis-aggregation-with-pbac-00.txt  November 1998


   8. References

   [1]  Berson, S., Vincent, S., "Aggregation of Internet Integrated
        Services State", (draft-berson-rsvp-aggregation-00), Internet
        Draft (work in progress), August 1998

   [2]  Jamin, S., Shenker, S., Danzig, P., "Comparison of Measurement-
        based Admission Control Algorithms for Controlled Load Service",
        Infocom 1997

   [3]  Braden, R., Zhang, L., Berson, S., Herzog, S., Jamin, S.,
        "Resource ReSerVation Protocol (RSVP) -- Version 1 Functional 
        Specification", RFC (Request for Comments) 2205, September 1997

   [4]  Mankin, A. et al, "Resource ReSerVation Protocol (RSVP) Version
        1 Applicability Statement - Some Guidelines on Deployment", RFC
        (Request for Comments) 2208, September 1997

   [5]  Wroclawski, J., "The Use of RSVP with IETF Integrated Services",
        RFC (Request for Comments) 2210, September 1997


Author's Address

   Marc Greis
   University of Bonn
   Institute of Computer Science IV
   Roemerstr. 164
   53117 Bonn
   Germany
   Email: greis@cs.uni-bonn.de

   Markus Albrecht
   University of Bonn
   Institute of Computer Science IV
   Roemerstr. 164
   53117 Bonn
   Germany
   Email: sukram@cs.uni-bonn.de












Greis, Albrecht              Expires 5/99                      [Page 12]