Internet DRAFT - draft-gilligan-iscsi-fault-tolerance
draft-gilligan-iscsi-fault-tolerance
Network Working Group Robert E. Gilligan
Internet Draft Rajkumar Velpuri
draft-gilligan-iscsi-fault-tolerance-00.txt Intransa, Inc.
Expires: October 2003
Lakshmi Ramasubramanian
Alan Warwick
Microsoft Corp.
Matthew W. Baker
Intel Corp.
April 2003
iSCSI Implementation Guidelines
for Fault Tolerance and Load Balancing
using Temporary Redirection
Status of this Memo
This document is an Internet-Draft and is subject to all provisions
of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Abstract
An approach for achieving fault tolerance and load balancing in iSCSI
using the iSNS discovery mechanism or iSCSI discovery session and the
temporary redirection mechanism is outlined here. This approach
requires no change to iSCSI or other protocols that initiators and
targets implement. But the manner in which initiators perform target
discovery, support the temporary redirection mechanism, and recover
from failed iSCSI sessions affects their ability to support this
approach. This paper provides implementation guidelines for iSCSI
initiators to follow to support this form of fault tolerance and load
balancing.
[Page 1]
draft-gilligan-iscsi-fault-tolerance-00.txt April 2003
1. Introduction
iSCSI can be used in a variety of configurations, including those in
which the target is implemented as a distributed collection of nodes,
or as a single node with multiple network interfaces. While fault
tolerance and load balancing is not directly addressed by the
protocol, the protocol does have two features that can be used to
help build a fault tolerant solution: the target discovery process
and the temporary redirection function. The iSCSI protocol's login
response process includes temporary redirection as a required
feature. However, the recovery behavior after a redirected session
fails or is explicitly terminated by an asynchronous event is not
specified.
The paper details the behavior that iSCSI initiators may implement to
enable a fault tolerant solution based on iSNS discovery or the
discovery session and the temporary redirection features. This
approach can be used to provide load balancing features as well. The
initiator behavior outlined here is fully compatible and compliant
with the iSCSI specification.
The paper first gives an overview of the approach, then details some
examples of the sequence of events that occur when this approach is
employed, and finally summarises the features initiators should
implement to support temporary redirect based fault tolerant and load
balancing systems.
2. Overview of the Approach
The solution relies on the initiator to establish a full feature
phase session in a procedure that may take up to three steps, then to
back up and repeat those steps in reverse order to recover the
session if the underlying TCP connection fails or is terminated.
The initiator discovers the address or addresses of a target by
either querying an iSNS server or by performing a discovery session
to the "portal" configured for the target system. A portal, in iSCSI
terminology, refers to an IP address and TCP port number pair. We
term the address that an initiator connects to in order to perform a
discovery session as the "discovery target portal." Both of these
processes -- iSNS or discovery session -- return a set of portals for
the target that we term the "initial target portals." Next, the
initiator initiates a session to the target by trying to open a TCP
connection to each of the initial target portals in sequence until
one succeeds. Once connected, the initiator logs in. In its login
response, the target may direct the initiator to a new portal via the
iSCSI temporary redirection function. We term this new portal the
"redirect portal" for the target. The initiator then closes the
initial connection and attempts to initiate a session with the
redirect portal. The negotiation of an iSCSI session to the redirect
portal is the final step leading to a normal, full feature session
between the initiator and the target, allowing data to flow.
If the connection fails, or is terminated by the target by an
[Page 2]
draft-gilligan-iscsi-fault-tolerance-00.txt April 2003
asynchronous logout message, the initiator performs a sequence of
actions to attempt to recover the session. Since this is session
recovery, the initiator performs these recovery actions no matter
what recovery level has been negotiated. The initiator first
attempts to re-connect to one of the initial target portals learned
from the iSNS query or during the discovery session, again trying
each in sequence until one succeeds. If this fails, the initiator
repeats the discovery phase, re-connecting to the discovery target
portal and re-running the discovery session, or re-running the iSNS
discovery procedure. After repeating the discovery process, the
initiator follows the procedure it used at the time of initial
connection, eventually connecting to a new redirect target for the
target and recovering the full feature session.
Fault tolerance in this scheme is achieved by allowing the target
system to direct the initiator to a specific network interface within
a multi-homed iSCSI target system, or a specific network node in a
clustered or distributed iSCSI target system for the duration of the
session. In the event of failure, the target system can direct the
initiator to a different, healthy interface or node, allowing the
session to be recovered. The same mechanism provides load balancing
by allowing the target to intelligently instruct the initiator to
establish a new session with a less heavily loaded node.
This solution relies on initiator support of the iSCSI temporary
redirect function. Additionally, the initiator must return to one of
the initial target portals in the event that a connection fails or is
terminated via an asynchronous logout message, and repeat the
discovery process if that fails. The iSCSI specification does not
dictate the precise recovery behavior for sessions established
following a temporary redirection by a target. Some initiator
implementations re-connect back to the same redirect portal after a
connection to that portal fails. However, this behavior would
prohibit the fault tolerant and load-balancing solution outlined
here, and violates the spirit of the temporary redirection function.
By returning to the redirect portal, the initiator is treating
redirection as greater than temporary, but less than permanent.
3. Example Scenarios
To further illustrate how this mechanism works, this section presents
the typical sequence of events that occur when a session is begun,
when a target node or interface serving the session fails, and when a
target decides to move an initiator to a different target node for
load balancing.
Initial connection sequence of events:
1. Initiator performs the discovery procedure by using iSNS or
executing a discovery session.
2. If iSNS is used, the initiator queries the iSNS server, which
returns a set of portals for the target.
[Page 3]
draft-gilligan-iscsi-fault-tolerance-00.txt April 2003
3. If the discovery session is used, the initiator opens a TCP
connection to the discovery target portal, logs in and issues
the "send targets" commands. The target responds with a list
of target names and their associated portals. The initiator or
user selects the portals associated with the specific target it
is interested in establishing a session with. The initiator
terminates the discovery session and closes the associated TCP
connection.
4. Whichever discovery procedure is used, the initiator remembers
the portals for this target as the "initial target portals".
5. The initiator iterates through the initial target portals list
until it succeeds in opening a TCP connection to one of them.
6. The initiator then logs into the target, which may respond with
a "target moved temporarily" redirect response, listing the
redirect portal for the target. The initiator remembers this
as the "redirect portal." The initiator then closes the TCP
connection.
7. The initiator then opens a TCP connection to the redirect
portal and logs in. The target accepts this login and the
session proceeds to full feature phase.
8. Data flow begins.
Target node or interface failure sequence of events:
1. The initiator has an iSCSI session established and TCP
connection open to the redirect portal. Full feature session
in progress. Data is flowing.
2. The target fails.
3. The initiator detects the failure of the TCP connection with
the target.
4. The initiator iterates through the list of initial target
portals learned in the discovery process until it succeeds in
opening a TCP connection to one of them.
5. If the initiator succeeds in connecting to one of the initial
target portals, it executes steps 6 and 7 in the "Initial
connection sequence of events" section.
6. If the initiator fails to connect to any of the initial target
portals, it repeats steps 1 through 7 in the "Initial
connection sequence of events" section.
7. Data flow resumes.
Overloaded target node sequence of events:
[Page 4]
draft-gilligan-iscsi-fault-tolerance-00.txt April 2003
1. The initiator has a full feature iSCSI session established, and
associated TCP connection open, to the redirect portal. Data
is flowing.
2. The target terminates the session with an asynchronous logout
message. The initiator closes the TCP connect.
3. If the asynchronous logout message PDU is type 1 (target
requests logout), the initiator logs out, closes the TCP
connection, and proceeds to step 4 in the "target node failure
sequence of events" section.
4. If the asynchronous logout PDU is type 2 (target will drop
connection), then Parameter2 (Time2Wait) specifies the time in
seconds that the initiator should wait before attempting to
re-login. The initiator should wait this time, then proceed to
step 4 in the "target node failure sequence of events" section.
The distinguished value of 0xFFFF may be used as an indication
that the initiator should not re-login without the intervention
of the administrator on the initiator. (The protocol provides
no other way for the target to signal to the initiator that it
does not wish it to re-connect.)
In these scenarios, the initiator is called on to determine when a
TCP connection with a target has failed, and also when an attempt to
open a new TCP connection to a target has failed. For both of these
determinations, the iSCSI layer in the initiator can simply rely on
the underlying TCP layer's retransmission abort timeout mechanism, or
it could implement timeouts of its own. The approach used, and the
timeout values that the initiator selects, are highly implementation
dependent. For example, some implementations allow applications to
select the TCP abort timeout, while others do not. No matter what
approach is taken, implementations may wish to make these two
failure-determining timeout values configurable so that
administrators may tune the system for operation in different
environments.
4. Summary of guidelines for initiators
To summarize, the mechanisms that initiators should implement to
support this approach for fault tolerance and load balancing are:
- Provide a target discovery mechanism by implementing either
iSNS or the iSCSI target discovery session.
- Accept and act upon the iSCSI temporary redirect login
response.
- If a session TCP connection to a redirect portal fails, try to
re-connect to the initial target portals.
- If a session is terminated by the target with an asynchronous
logout message, try to re-connect to the initial target
portals.
[Page 5]
draft-gilligan-iscsi-fault-tolerance-00.txt April 2003
- If attempts to connect to the initial target portals fail,
re-run the discovery mechanism.
5. Acknowledgements
Thanks to Bill Nowicki of Intransa, and Hari Mudaliar of Adaptec for
their helpful comments on early revisions of this paper.
6. Security considerations
The initiator must perform authentication on every login.
7. References
[ISCSI] J. Satran, K. Meth, C. Sapuntzakis, M. Chadalapaka, E. Zeidner,
``iSCSI'', draft-ietf-ips-iscsi-20.txt, Work in progress.
[ISNS] Josh Tseng, Kevin Gibbons, Franco Travostino, Curt Du Laney,
Joe Souza, ``Internet Storage Name Service (iSNS)'',
draft-ietf-ips-isns-17.txt, Work in progress.
Authors Address
Robert E. Gilligan
Intransa, Inc.
2870 Zanker Road
San Jose, CA 95134
Email: gilligan@intransa.com
Phone: 408-678-8647
Rajkumar Velpuri
Intransa, Inc.
2870 Zanker Road
San Jose, CA 95134
Email: Rajkumar.velpuri@intransa.com
Phone: 408-678-8641
Lakshmi Ramasubramanian
Microsoft Corp.
One Microsoft Way
Redmond, WA 98052
Email: nramas@microsoft.com
Phone: 425-703-7559
Alan Warwick
Microsoft Corp.
One Microsoft Way
Redmond, WA 98052
Email: alanwar@microsoft.com
Phone: 425-706-0230
Matthew W. Baker
Intel Corporation
1501 South Mopac Expressway, #400
Austin, TX 78746
Email: matt.w.baker@intel.com
Phone: 512-732-1306