Internet DRAFT - draft-bala-protection-restoration-signaling
draft-bala-protection-restoration-signaling
Internet Draft Bala Rajagopalan
draft-bala-protection-restoration-signaling- Debanjan Saha
00.txt Tellium, Inc.
Expires on: 5/14/2002 G. Bernstein
Ciena Corp.
Vishal Sharma
Metanoia, Inc.
Ayan Banerjee
John Drake
Jonathan Lang
Calient Networks
Jennifer Yates
Guangzhi Li
AT&T
Signaling for Protection and Restoration in Optical Mesh Networks
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026 except that the right to
produce derivative works is not granted.
Internet Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts. Internet-Drafts are draft documents valid for a maximum of
six months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet- Drafts
as reference material or to cite them other than as "work in
progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
1. Abstract
Protection and restoration of switched connections under tight time
constraints is a challenging problem in optical mesh networks. This
draft describes different local and end-to-end protection modes for
connections, and the message flow required for protection and
restoration-related signaling.
Expires on 5/14/2002 Page 1
draft-bala-protection-restoration-signaling-00.txt
2. Introduction
Protection and restoration of switched connections under tight time
constraints is a challenging problem in optical mesh networks. Such
a network consists of optical or photonic cross-connects (referred
to as "nodes") connected in a general topology [1]. Restoration
typically involves the activation of an alternate (or "protection")
path for a connection when a failure is encountered in the primary
(or working) path. A path for a connection (working or protection)
is characterized by an ingress port, an egress port, and a set of
intermediate nodes and links through which the connection is routed.
The working and protection paths are typically resource-disjoint
(e.g., node or link disjoint) other than the ingress and egress
ports which remain the same.
A bi-directional link between neighboring nodes is usually realized
as a pair of unidirectional links. The end-to-end path for a bi-
directional connection therefore consists of a series of bi-
directional segments between the source and destination nodes,
traversing intermediate nodes.
The following distinction is made between the terms "protection" and
"restoration", even though these terms are often used
interchangeably [2]. Protection is defined as the paradigm whereby a
dedicated protection path is pre-established for a connection, and
the connection is merely switched at the endpoints from the working
to the protection path after a failure. The term restoration, on the
other hand, is used to denote the paradigm whereby a protection path
for a connection may be selected apriori, but its establishment
occurs only after a failure in the working path. This distinction is
subtle, and both protection and restoration require signaling.
Protection can be "local span" or "end-to-end". Local span
protection refers to the protection of the link (and hence
connection segments routed over the link) between two neighboring
switches. End-to-end protection refers to the protection of an
entire connection from the ingress to the egress port. A connection
may be subject to both local span protection (for each of its
segments) and end-to-end protection (when local protection does not
succeed or is not desired). Under local span and end-to-end
protection schemes, it may be required that when a failure affects
any one direction of the connection, both directions of the
connection are switched to a new link or path, respectively. In the
following, therefore, any reference to a "link" indicates a bi-
directional link (realized as a pair of uni-directional links),
unless noted otherwise.
2.1 Local Span Protection
Considering local-span protection, suppose a connection segment is
routed over link i between two nodes A and B. The following
protection modes may be used:
Expires on 8/22/01 Page 2
draft-bala-protection-restoration-signaling-00.txt
1+1 (unidirectional): A dedicated link j is pre-assigned to
protect working link i. Connection traffic is simultaneously
sent on both links and received independently by A and B from
one of the functioning links, i or j. Thus, it is possible that
nodes A and B may be receiving traffic from different links.
1+1 (bi-directional): A dedicated link j is pre-assigned to
protect link i. Connection traffic is simultaneously sent on
both links and under normal conditions, the traffic from link i
is received by nodes A and B (in the appropriate directions). A
failure affecting link i results in both A and B switching to
the traffic on link j in the respective directions.
1:N: A dedicated link j between A and B is pre-assigned to
protect a set of N links (which includes i). A failure
affecting any link in this set results in the corresponding
traffic being restored to link j. Clearly, if more than one
link in the set of N links are concurrently affected by
failures, the traffic on only one of the N links may be
restored over link j.
M:N (with pre-configured protection groups): A protection group
of M+N links consists of a set of N links protected by a set of
M other links, with M < N. (Link i must be one of the N links
in a protection group, and link j is one of the M links). A
failure in any of the N links results in traffic being switched
to one of the (available) M links. The number of protection
groups between A and B, the value of M and N, as well as the
specific links in each protection group are pre-configured.
Since M < N, it is possible that not all failed links in the
set of N links may be protected from the same failure event.
M:N (pooled protection links): Under this mode, a total of M
links are assigned to protect a total of N other links between
A and B, where M+N is the *total* number of links between A and
B. (Link i must be one of the set of N links, and link j is one
of the set of M links). This mode thus differs from the
previous where there could be multiple hard-configured
protection groups and working links in one group cannot be
protected by protection links in another group. Furthermore,
under this mode, the number of protection links need not be
pre-configured and may vary depending on the demand from
working traffic. Indeed, any available link can be used for
protection purposes during a failure event. Thus, this is a
more general and flexible protection mode compared to the
previous. For administrative purposes, however, the value of M,
N, and the specific links in the set of M protection links may
be pre-determined. As before, since M < N, it is possible that
not all failed links in the set of N links may be protected
from the same failure event.
Expires on 8/22/01 Page 3
draft-bala-protection-restoration-signaling-00.txt
Since M:N (with pre-configured protection groups) and 1:N are
special cases of M:N (pooled protection links), we focus on the
signaling requirements for M:N protection with pooled protection
links.
2.1 End-to-End Protection and Restoration
Considering end-to-end protection, suppose a connection's primary
path is from an ingress port in node A to an egress port in node B
over a set of intermediate nodes. The following protection and
restoration modes may be used:
1+1 (unidirectional) protection: A dedicated, resource-disjoint
alternate path is pre-established to protect the connection.
Connection traffic is simultaneously sent on both paths and
received from one of the functional paths by the end nodes, A
and B.
1+1 (bi-directional) protection: A dedicated, resource-disjoint
alternate path is pre-established to protect the connection.
Connection traffic is simultaneously sent on both paths; under
normal conditions, the traffic from the primary path is
received by nodes A and B (in the appropriate directions). A
failure affecting the primary path results in both A and B
switching to the traffic on the back-up path in the respective
directions
Shared mesh restoration: An alternate path is pre-assigned to
protect the connection, but the resources along the alternate
path may be shared among multiple connections being protected
(based on criteria described later). In this case, the
resources are allocated in real-time for one of the protected
connections whose primary path is affected by a failure. If
more than one connection sharing a resource is concurrently
affected by a failure, only one of them will be allocated the
shared resource.
New protocol mechanisms are required to realize both 1+1 (bi-
directional) protection and shared mesh restoration in optical
networks. Specifically, 1+1 (bi-directional) protection requires
coordination between the end nodes to switch to the protection path,
and shared mesh restoration additionally involves the intermediate
nodes in the protection path.
The aim of this draft is to define the message flows for M:N pooled
local span protection, 1+1 (bi-directional)protection and shared
mesh restoration. The main requirements on these protocols are
simplicity and speed. The latency requirement on switching to
protection paths is typically specified in tens to hundreds of
milliseconds, the performance depending on the number of hops
involved [2].
Expires on 8/22/01 Page 4
draft-bala-protection-restoration-signaling-00.txt
Sections 3 and 4 describe local span and end-to-end protection
protocols in detail. Section 5 describes certain administrative
procedures related to restoration. Section 6 presents some
discussion items and Section 7 presents the conclusions.
3. Local Span Protection
Local span protection is described with respect to two neighboring
nodes A and B. The scenario considered for local span protection
(M:N with pooled protection links) is as follows:
o At any point in time, there are two sets of links between A and
B, i.e., a working set of N (bi-directional) links carrying
traffic subject to protection and a protection set of M (bi-
directional) links. A protection link may have no traffic on
it, or it may be carrying traffic that could be preempted.
There is no a priori relationship between the two sets of
links, but the value of M and N may be pre-configured. The
specific links in the protection set MAY be pre-configured to
be physically diverse to avoid the possibility that failure
events affect a large proportion of protection links (along
with working links).
o When a link in the working set is affected by a failure, the
traffic on it is diverted to a link in the protection set, if
such a link is available. Note that such a link might consist
of more than one connection e.g., an OC-192 link carrying four
OC-48 connections.
o More than one link in the working set may be affected by the
same failure event. In this case, there may not be an adequate
number of protection links to accommodate all of the affected
traffic carried by failed working links. The set of affected
working links that are actually restored over available
protection links is then subject to policies (e.g., based on
relative priority of working traffic). These policies are not
specified in this draft.
o Each node is assumed to have an identifier, called the Node ID.
Each node is also assumed to have the mapping of its local link
(or port) ID to the corresponding ID at the neighbor. This
mapping could be configured, or obtained automatically using a
neighbor discovery procedure (e.g., LMP [3]).
o When traffic must be diverted from a failed link in the working
set to a protection link, the decision as to which protection
link is chosen is always made by one of the nodes, A or B. As
per this draft, the node with the numerically higher Node ID is
considered the "master" and it is required to both apply any
policies and select specific protection links to divert working
traffic. The other node is considered the "slave". The
determination of the master and the slave may be based on
Expires on 8/22/01 Page 5
draft-bala-protection-restoration-signaling-00.txt
configured information, or as a result of running a neighbor
discovery procedure.
o Failure events themselves are assumed to be detected by lower
layer mechanisms (e.g., SONET). Since the bi-directional links
are formed by a pair of unidirectional links, a failure in the
link from A to B is typically detected by B and a failure in
the opposite direction is detected by A. It is possible that a
failure simultaneously affects both directions of the bi-
directional link. In this case, A and B will concurrently
detect failures, in the B-to-A direction and in the A-to-B
direction, respectively.
The basic steps in local span protection are as follows:
1. If the master detects a failure of a working link, it
autonomously invokes a process to allocate a protection link to
the affected traffic.
2. If the slave detects a failure of a working link, it must
inform the master of the failure. The master then invokes the
same procedure as above to allocate a protection link. (It is
possible that the master has itself detected the same failure,
for example, a failure simultaneously affecting both directions
of a link).
3. Once the master has determined the identity of the protection
link, it indicates this to the slave and requests the
switchover of the traffic. Prior to this, if the protection
link is carrying traffic that could be preempted, the master
stops using the link for this traffic (i.e., the traffic is
dropped by the master and not forwarded into or out of the
protection link).
4. The slave sends an acknowledgement to the master. Prior to
this, if the selected protection link is carrying traffic that
could be preempted, the slave stops using the link for this
traffic (i.e., the traffic is dropped by the slave and not
forwarded into or out of the protection link). It then starts
sending the (failed) working link traffic on the selected
protection link.
5. When the master receives the acknowledgement, it starts sending
and receiving the (failed) working link traffic over the new
link.
From the description above, it is clear that local span restoration
may require up to three messages for each working link being
switched: a failure indication message, a switchover request message
and a switchover response message. The following identifiers are
also needed:
3.1 Identifiers
Expires on 8/22/01 Page 6
draft-bala-protection-restoration-signaling-00.txt
Node ID: An identifier that uniquely identifies each node in the
network.
Link ID: An identifier that uniquely identifies a bi-directional
link at the sending and the receiving node.
The messages are as follows. All these messages must be transmitted
reliably from the message source to the message destination (master
or slave).
3.2 Failure Indication Message
This message is sent from the slave to the master to indicate the
failure of one or more working links. (This message may not be
necessary when the underlying link technology itself provides for
such a notification).
The number of links included in the message would depend on the
number of failures detected within a window of time by the sending
node. A node may choose to send separate failure indication messages
in the interest of completing the restoration for a given link
within an implementation-dependent time constraint.
The ID of the failed link is the identification used at the slave
node. The master must convert this to the corresponding ID at its
side.
3.3 Switchover Request Message
This message is sent from the master to the slave (reliably) to
indicate whether the traffic on the failed working link can be
switched to a free link. If so, the ID of the free link must be
indicated.
The link IDs are based on the identification used at the master. The
slave must convert them to the corresponding local IDs. The message
ID uniquely identifies the message at the master.
A link being protected may carry multiple connections. Since the
entire working link is switched to a protection link, it may be
possible for the connections on the working link to be mapped to the
protection link by the master and slave without coordination (e.g.,
if the channel assignments (i.e., "labels") are the same on the
working and protect links). Optionally, if it is necessary, the
channel assignments (labels) may be explicitly coordinated between
the master and the slave (e.g., when a smaller capacity link is
protected by a larger capacity link). In this case, the Switchover
Request message should carry the new label mappings selected by the
master.
Expires on 8/22/01 Page 7
draft-bala-protection-restoration-signaling-00.txt
The master may not be able to find protection lines to accommodate
all failed working links. Thus, if this message is generated in
response to a Failure Indication message from the slave then the set
of failed links in the message may be a sub-set of the links
received in the Failure Indication message. Depending on time
constraints, the master may switch the set of failed links in
smaller batches. Thus, A failure event may result in the master
sending more than one Switchover Request message to the same slave
node.
3.4 Switchover Response Message
This message is sent from the slave to the master (reliably) to
indicate the completion (or failure) of switchover at the slave.
In this message, the slave may indicate that it cannot switch over
to the corresponding free link for some reason. The action to be
taken by the master in this case is undefined (for example, the
master may abort the switchover of the traffic on the failed working
link, and perhaps trigger end-to-end protection).
3.5 Preventing Unintended Connections
An unintended connection occurs when traffic from the wrong source
is delivered to a receiver. These should be prevented during
protection switching. This is a concern only when the protection
link is being used to carry (unprotected) traffic that could be
preempted. In this case, it must be ensured that the traffic being
switched from the failed working link to the protection link is not
delivered to the receiver of the traffic preempted. Thus, in the
message flow described above, the master should disconnect (any)
preempted traffic on the selected protection link before sending the
Switchover Request. The slave should also disconnect preempted
traffic before sending the Switchover Response. In addition, the
slave should start receiving traffic for the protected connection
from the protection link. Finally, the master should start sending
protected traffic on the protection link upon receipt of the
Switchover Response.
4. End-to-End Protection
One of the significant differences between end-to-end protection and
local span protection (as considered in this draft) is that the
former is on a per-connection basis while the latter is on a per-
link basis. In other words, span protection switches over the entire
traffic on a link which may consist of multiple connections. End-to-
end protection, on the other hand, switches over individual
connections. In this case, there is a working connection path and a
protection path.
Expires on 8/22/01 Page 8
draft-bala-protection-restoration-signaling-00.txt
Another difference between end-to-end and local protection is that
signaling messages may have to be transmitted multiple hops to
effect restoration. The signaling messages are transmitted to the
source of the connection. The messages are typically forwarded along
the connection path, working or protection, where it is assumed that
there is a control channel between each pair of intermediate nodes.
If the optical network has routing intelligence, some of these
messages can also be routed over other paths.
There are two cases to be considered: signaling for bi-directional
1+1 protection and for shared mesh restoration. The description
below is in the context of an end-to-end connection between a source
node A and a destination node B.
4.1 Bi-directional 1+1 Protection
Under bi-directional 1+1 protection, the connection traffic is being
sent on both working and protection paths by A and B, but received
only from the working path. After a failure event, signaling between
A and B is required to ensure that both A and B start receiving from
the protection path.
A node in the working path detects a failure event. Such a node must
send a failure indication signal towards the source of the
connection. This message may be forwarded along the working path, or
routed over a different path if the network has general routing
intelligence. Mechanisms provided by the lower layer may also be
used for this, if available.
The action when the source node is notified of a failure is as
follows:
o Start receiving from the protection path. At the same time,
send a message to the destination node to enable switching at
the destination.
The action when the destination node receives the above message is
as follows:
o Start receiving from the protection path. At the same time,
send an acknowledgement to the source node.
(These two messages may be forwarded along the protection path if no
other routing intelligence is available in the network)
4.1.1 Identifiers
Connection ID: A unique ID for each connection.
Source ID: ID of the source (e.g., IP address).
Destination ID: ID of the destination (e.g., IP address).
Expires on 8/22/01 Page 9
draft-bala-protection-restoration-signaling-00.txt
4.1.2 Nodal Information
Each node that is on the working or protection path of a connection
must at least have knowledge of the connection identifier, the
previous and next nodes in the connection path and the type of
protection being afforded to the connection (i.e., 1+1 or shared).
This is so that restoration-related messages may be forwarded
properly. The optical network may also have additional routing
intelligence. In this case, messages may be forwarded along paths
different than the connection path.
The nodal information may be assembled when the working and
protection paths of the connections are provisioned using signaling,
or may be configured in the case of NMS-based provisioning. The
information must remain until the connection is explicitly de-
provisioned.
4.1.3 End-to-End Failure Indication Message
This message is sent (reliably) by an intermediate node towards the
source of a connection. For instance, such a node might have
attempted local span protection and failed. This message may not be
necessary if the lower layer provides mechanisms for detection of
connection failure by the endpoints.
Consider a node detecting a link failure. The node must determine
the identities of all connections that are affected by the failure
of the link, and send an end-to-end failure indication message to
the source of each connection. Each intermediate node receiving such
a message must determine the appropriate next node to forward the
message such that the message would reach the connection source.
Furthermore, if an intermediate node is itself generating a failure
indication message, there should be a mechanism to suppress all but
one source of failure indication messages. Finally, the failure
indication message must be sent reliably from the node detecting the
failure to the connection source. Reliability may be achieved, for
example, by re-transmitting the message until an acknowledgement is
received.
4.1.4 End-to-End Failure Acknowledge Message
This message is sent by the source node in response to an End-to-End
failure indication message. This message is sent to the originator
of the failure indication message. The acknowledge message should be
sent for each failure indication message received.
Each intermediate node receiving the acknowledge message must
forward it towards the destination of the message.
4.1.5 End-to-End Switchover Request Message
Expires on 8/22/01 Page 10
draft-bala-protection-restoration-signaling-00.txt
This message is generated by the source node receiving an indication
of failure in a connection. It is sent to the connection
destination, and it carries the Connection ID of the connection
being restored. This message must indicate whether the source is
able to switch over to the protection path or not. If the source is
not able to switchover, the destination may not also switch over.
The End-to-End Switchover message must be sent reliably from the
source to the destination of the connection.
4.1.6 End-to-End Switchover Response Message
This message is sent by the destination node receiving an End-to-End
Switchover Request message towards the source of the connection.
This message should indicate the Connection ID of the connection
being switched over.
This message must be transmitted in response to each End-to-End
Switchover Request message received.
4.2 Shared Mesh Restoration
Shared mesh restoration requires prior soft-reservation of capacity
along the protection path [4]. Furthermore, after a failure event,
the protection path must be explicitly activated. This requires
actions at each intermediate node along the protection path. It is
possible that a protection path may not be successfully activated
when multiple, concurrent failure events occur. In this case, shared
mesh restoration capacity may be claimed for more than one failed
connection and the protection path can be activated only for one of
them (at most).
For implementing shared mesh restoration, the identifier and nodal
information related to signaling along the control path are as
defined for 1+1 protection in Sections 5.1.1 and 5.1.2. In addition,
each node must also keep information needed to establish the data
plane of the protection path. This information could be fine-
grained, indicating the cross-connect that must be established to
activate the protection path for each connection, as follows:
{ Connection ID, <Incoming Port, Channel etc>, <Outgoing Port,
Channel, etc> }
The precise nature of the Port, Channel, etc. information would
depend on the type of node and connection (The Generalized MPLS
signaling draft describes different type of switches [5]).
On the other hand, this information could be coarse-grained,
indicating
{ Connection ID, <Incoming TE link>, <Outgoing TE link> }
Expires on 8/22/01 Page 11
draft-bala-protection-restoration-signaling-00.txt
In this case, a specific component link and channel on the TE link
is allocated only when the protection path is activated. While the
coarser specification allows some flexibility in selection of the
precise resource to activate, it also brings in more complexity in
decision making and signaling during the time-critical restoration
phase. Furthermore, the procedures for the assignment of bandwidth
to protection paths must take into account the total resources in a
TE link so that single-failure survivability requirements are
satisfied.
4.2.1 End-to-End Failure Indication and Acknowledgement
The End-to-End failure indication and acknowledgement procedures and
messages are as defined in Sections 5.1.3 and 5.1.4.
4.2.2 End-to-End Switchover Request
This message is generated by the source node receiving an indication
of failure in a connection. It is sent to the connection destination
along the protection path, and it carries the Connection ID of the
connection being restored. This message must allow intermediate
nodes to record whether they are able to activate the (shared)
protection path. If any intermediate node is not able to establish
cross-connects for the protection path then it is desirable that no
other node in the path establishes cross-connects for the path. This
would allow shared mesh restoration paths to be efficiently
utilized. This requirement implies that switchover to the protection
path occurs in two phases: in the forward phases, the Switchover
Request message indicates the switching over action to intermediate
nodes in the protection path and collects information as to their
ability to switch over. In the reverse phase, the actual switchover
occurs if all nodes in the path indicate their ability to switch
over.
The End-to-End Switchover message must be sent reliably from the
source to the destination of the connection along the protection
path.
4.2.3 End-to-End Switchover Response
This message is sent by the destination node receiving an End-to-End
Switchover Request message towards the source of the connection,
along the protection path. This message should indicate the id of
the connection being switched over, and whether all intermediate
nodes have agreed to switch over (as determined in the forward
phase using the Switchover Request message).
This message must be transmitted in response to each End-to-End
Switchover Request message received.
5. Reversion and other Administrative Procedures
Expires on 8/22/01 Page 12
draft-bala-protection-restoration-signaling-00.txt
Reversion refers to the process of moving a connection back to the
original working path from its protection path after the former is
restored after a failure. Reversion applies both to local span and
end-to-end path protected connections. Reversion is desired for the
following reasons. First, the routing of the protection path often
may not be as efficient as the routing of the working path. Second,
moving a connection to its working path allows the protection
resources to be used to protect other connections.
Reversion implies that a working path remains allocated to the
connection that was originally routed over it even after a failure.
It is important to have mechanisms that allow reversion to be
performed without disrupting service to the customer. This can be
achieved if reversion is implemented using a "bridge-and-switch"
approach (often referred to as make-before-break).
The basic steps involved in bridge-and-switch are:
1. The source node commences the process by "bridging" the signal
onto both the working and the protection paths (or links in the
case of span protection).
2. Once the bridging process is complete, the source node sends a
Bridge and Switch Request message to the destination, identifying
the connection and other information necessary to perform
reversion. Upon receipt of this message, the destination selects
the signal from the working path. At the same time, it bridges the
transmitted signal onto both the working and protection paths.
3. The destination then sends a Bridge and Switch Response message to
the source confirming the completion of the operation.
4. When the source receives this message, it switches to receive from
the working path, and stops transmitting traffic on the protection
path. The source then sends a Bridge and Switch Completed message
to the destination confirming that the connection has been
reverted.
5. Upon receipt of this message, the destination stops transmitting
along the protection path and de-activates the connection along
this path. The de-activation procedure should remove the cross-
connections along the protection path (and frees the resources to
be used for restoring other failures.
Administrative procedures other than reversion include the ability
to force a switchover (from working to protect or vice versa), and
locking out switchover, i.e., preventing a connection from moving
from working to protect or vice versa administratively. These
administrative conditions have to be supported by signaling.
6. Discussion
6.1 Relationship between Local and End-to-End Protection Procedures
In general, local protection may be attempted before invoking end-
to-end protection. The exception to this is when end-to-end 1+1
protection is used for a connection. In this case, it is better to
Expires on 8/22/01 Page 13
draft-bala-protection-restoration-signaling-00.txt
directly invoke end-to-end protection since alternate path resources
are already active for the connection.
Thus, the general guideline that may be considered is to note the
protection type of connections in intermediate nodes during
provisioning, and invoke local span protection only for working
links carrying connections that are not 1+1 protected end-to-end.
This implies that when a working link carries more than one
connection, all the connections must have the same end-to-end
protection type. The provisioning process must ensure this. If this
is not possible then local span protection may be invoked for
working links that have at least one connection that is not end-to-
end 1+1 protected.
6.2 Connection Priorities During Protection
The local protection procedure described in this draft switches all
the connections on a failed working link onto a protection link. The
advantage of this approach is that the signaling between nodes is at
the level of links and not at the level of connections. This is
beneficial if a link could potentially carry a number of
connections. On the other hand, it limits flexibility, since a
working link must carry connections of similar priority. Otherwise,
it is not possible to ensure that higher priority connections are
favored over lower priority connections when a failure event affects
more than one working link and there are fewer protection links than
the number of failed working links.
Also, under the above failure scenario, a decision must be made as
to which working links (and therefore connections) are chosen to be
protected and in what priority order. In general, a node might
detect failures sequentially, i.e., all failed working links may not
be detected simultaneously, but only sequentially. In this case, as
per the proposed signaling procedures, connections on a working link
may be switched over to a given protection link, but another failure
(of a working link carrying higher priority connections) may be
detected soon afterwards. In this case, the new connections may bump
the ones previously switched over the protection link.
In the case of end-to-end shared mesh restoration, priorities may be
implemented for allocating shared link resources under multiple
failure scenarios. Note that shared mesh restoration works under the
assumption that the primary path of connections whose backups share
resources are SRLG-disjoint [1]. Under single-failure scenarios,
this would ensure that exactly one connection will "claim" the
allocated (shared) resource. But under multiple failure scenarios,
more than one connection can claim shared resources. If such
resources are allocated to a lower priority connection, they may
have to be reclaimed and allocated to a higher priority connection.
Furthermore, the lower priority connection must be de-provisioned
along the protection path (this can be done using the signaling
mechanisms developed for provisioning, rather than restoration
signaling). The proposed signaling mechanisms can support
Expires on 8/22/01 Page 14
draft-bala-protection-restoration-signaling-00.txt
connection-priority based allocation of shared resources during
restoration signaling (specifically, during the Switchover Response
step).
A way to simplify end-to-end shared mesh restoration is to allocate
shared resources to connections of the same priority. This way, a
connection will not be first allocated shared resources and then
bumped from the protection path.
6.3 Routing Aspects
To compute end-to-end protection paths, it is necessary to know
which network resources can be used. For end-to-end 1+1 protection,
any free resource in the network can be used. In this regard, the
computation of the working and the protection paths is similar. For
shared mesh restoration, however, it is necessary to know the
availability of shareable as well as free resources. Generally,
protection paths may share resources if the corresponding working
paths will not be affected by the same failure. Thus, to determine
shareable resources for a given protection path optimally, it is
necessary to know full information about other working paths.
Maintaining this sort of information may be suitable in a
centralized routing implementation, but it may be not be scaleable
under distributed routing. Under distributed routing, heuristics are
often used to provision shared protection paths [12]. The specific
routing information to be propagated and the signaling for the
provisioning of shared protection paths are topics to be dealt with
in separate drafts.
6.4 Multi-Domain Restoration
When an end-to-end connection follows a path through multiple
routing or administrative domains, it may be required to consider an
intermediate form of restoration, called "intra-domain end-to-end
restoration". With this approach, a failure within a domain would
result in end-to-end restoration between the connection ingress and
egress points within the domain (perhaps after local span
restoration is attempted). When this fails, or if a failure occurs
in an inter-domain link, full end-to-end restoration could be
attempted (inter-domain links could also be subject to local span
protection).
This type of a structured approach for restoration is particularly
useful in the near term when an optical network may be constructed
by interconnecting multi-vendor optical subnetworks [1]. In this
case, intra-domain restoration may be proprietary, with standard
restoration signaling implemented between border nodes. But this
type of restoration also requires some hardware support at the
border nodes.
6.5 Optical mesh restoration and MPLS-based recovery
Expires on 8/22/01 Page 15
draft-bala-protection-restoration-signaling-00.txt
Over the past year or so, there has been considerable work on
MPLS-based recovery under the auspices of the MPLS WG (see, for
example, [6-11]), with a framework document [6] being adopted as a
WG document.
The terminology outlined at the start of this document is also
explained in the MPLS-recovery framework document [6], in the
context of MPLS LSP-based recovery.
The failure indication message of Section 4, is quite similar to
the failure indication signal (FIS) defined in [7], and elaborated
on in [10] and [11]. A difference between the schemes and message
formats discussed in this document and those presented in [7],
[10], and [11], is that these documents focus primarily on MPLS
LSP restoration. As such, the messages defined therein contain
explicit label information for packet LSPs, which is not required
in optical networks. Further, [7] does not specifically cover the
case of the coordinated signaling required for local span
protection and for M:N protection with pooled protection links,
which are central to this proposal.
6.6 Implementation Considerations
As described in this draft, restoration signaling does not require
any central actions (such as admission control or centralized
resource allocation) within a node for end-to-end protection. Local
span protection may require the consideration of all available
protection link resources at the master. End-to-end protection,
which is more difficult from a latency perspective, can be
controlled by distributing multiple, independent protocol instances
in an node such that each instance covers a subset of connections
passing through an node. Such optimizations would depend on the
architecture of the systems implementing the proposed protocol.
7. Conclusion
In this draft, the signaling message flows for protection and
restoration in optical mesh networks was described. The types of
protection modes considered were local span protection and end-to-
end protection, 1+1 and shared. Specific protocol realization of the
message flows will be described in other drafts.
8. References
1. B. Rajagopalan, et al., "IP over Optical Networks: A Framework",
draft-ietf-ipo -framework-00.txt.
2. W.S Lai, et al., "Network Hierarchy and Multilayer Survivability,"
Internet Draft, draft-team-tewg-restore-hierarchy-00.txt, July,
2001.
Expires on 8/22/01 Page 16
draft-bala-protection-restoration-signaling-00.txt
3. J. P. Lang, et al, "Link Management Protocol", draft-ietf-mpls-
lmp-02.txt.
4. G. Li, et. al., "RSVP-TE Extensions For Shared-Mesh Restoration in
Transport Networks," draft--li-shared-mesh-restoration-00.txt.
5. P. Ashwood-Smith, et al., "Generalized MPLS: Signaling Functional
Specification," draft-ietf-mpls-generalized-signaling-06.txt.
6. Makam, et al, "Framework for MPLS-based Recovery," draft-ietf-
mpls-recovery-frmwrk-03.txt.
7. K. Owens et al, "A Path Protection/Restoration Mechanism for MPLS
Networks," draft-chang-mpls-path-protection-03.txt.
8. Kini, S., et al, "Shared Backup Label Switched Path Restoration,"
draft-kini-restoration-shared-backup-01.txt.
9. Hellstrand, F., and Andersson, L., "Extensions to CR-LDP and RSVP-
TE for setup of pre-established recovery tunnels," draft-
hellstrand-recovery-merge-01.txt.
10. K. Owens et al, "Extensions to RSVP-TE for MPLS Path
Protection," draft-chang-mpls-rsvpte-path-protection-ext-01.txt.
11. K. Owens et al, "Extensions to CR-LDP for MPLS Path
Protection," draft-owens-mpls-crldp-path-protection-ext-01.txt.
12. S. Sengupta and R. Ramamurthy, "Capacity Efficient Distributed
Routing of Mesh-Restored Lightpaths in Optical Networks," Proc.
IEEE Globecom 2001, November, 2001.
Expires on 8/22/01 Page 17
draft-bala-protection-restoration-signaling-00.txt
9. Author Information
Bala Rajagopalan Greg Bernstein
Debanjan Saha Ciena Corp.
Tellium, Inc. 10480 Ridgeview Ct.
2 Crescent Pl. Cupertino, CA 94014
Ocean Port, NJ 07757 Email: Greg@ciena.com
Email: {braja, dsaha}@tellium.com
Vishal Sharma Ayan Banerjee
Metanoia, Inc. John Drake
305 Elan Village Lane, Unit 121 Jonathan Lang
San Jose, CA 95134 Calient Networks
Email: V.Sharma@ieee.org 5853 Rue Ferrari
San Jose, CA 95138
Email: {abanerjee, Jdrake,
jplang}@calient.net
Jennifer Yates
Guangzhi Li
AT&T
180 Park Ave.
Florham Park, NJ 07932
Email: {jyates, gli}@research.att.com
Expires on 8/22/01 Page 18