Internet DRAFT - draft-chadalapaka-command-ordering
draft-chadalapaka-command-ordering
Command Ordering 21-February-03
IPS Mallikarjun Chadalapaka
Internet Draft Rob Elliott
draft-chadalapaka-command-ordering-00.txt Hewlett-Packard Co.
Category: Informational-track
SCSI Command Ordering Considerations with iSCSI
Mallikarjun Chadalapaka Expires August 2003 1
Command Ordering 21-February-03
Status of this Memo
This document is an Internet-Draft and fully conforms to all provi-
sions of Section 10 of [RFC2026].
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for at most six months and
may be updated, replaced, or made obsolete by other documents at any
time. It is inappropriate to use Internet- Drafts as reference mate-
rial or to cite them except as "work in progress."
The list of Internet-Drafts can be accessed at http://www.ietf.org/
ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
iSCSI is a SCSI transport protocol designed to run on top of TCP. The
iSCSI session abstraction is equivalent to the SCSI I_T nexus, and
the iSCSI session provides an ordered command delivery from the SCSI
initiator to the SCSI target. This document goes into the design
considerations that led to the iSCSI session model as it is defined
today, relates the SCSI command ordering features defined in T10
specifications to the iSCSI concepts, and finally provides guidance
to system designers on how true command ordering solutions can be
built based on iSCSI.
Acknowledgements
We are grateful to the IPS working group whose work defined the iSCSI
protocol. Thanks also to David Black (EMC) who encouraged the publi-
cation of this document. Special thanks are also in order for Randy
Haagens (HP) for his insightful review comments.
Mallikarjun Chadalapaka Expires August 2003 2
Command Ordering 21-February-03
Status of this Memo . . . . . . . . . . . . . . . . . . . . . . . . . 2
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1. Definitions and Acronyms . . . . . . . . . . . . . . . . . . . . . 4
1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3. Overview of the iSCSI Protocol . . . . . . . . . . . . . . . . . . 6
3.1 Protocol mapping description . . . . . . . . . . . . . . . . . 6
3.2 The I_T nexus model . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Ordered command delivery . . . . . . . . . . . . . . . . . . . 8
3.3.1 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3.2 The session guarantee . . . . . . . . . . . . . . . . . . 8
3.3.3 Ordering onus . . . . . . . . . . . . . . . . . . . . . . 9
3.3.4 Final intent . . . . . . . . . . . . . . . . . . . . . . . 9
4. The Command Ordering Scenario . . . . . . . . . . . . . . . . . . 9
4.1 SCSI layer . . . . . . . . . . . . . . . . . . . . . . . . . .10
4.1.1 Command Reference Number (CRN) . . . . . . . . . . . . . .10
4.1.2 Task Attributes . . . . . . . . . . . . . . . . . . . . .10
4.1.3 Auto Contingent Allegiance (ACA) . . . . . . . . . . . . .10
4.1.4 UA interlock . . . . . . . . . . . . . . . . . . . . . . .10
4.2 iSCSI layer . . . . . . . . . . . . . . . . . . . . . . . . .11
5. Connection failure considerations . . . . . . . . . . . . . . . .11
6. Implementation considerations . . . . . . . . . . . . . . . . . .12
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . .14
8. Security Considerations . . . . . . . . . . . . . . . . . . . . .14
9. References and Bibliography . . . . . . . . . . . . . . . . . . .15
9.1 Normative References . . . . . . . . . . . . . . . . . . . . .15
9.2 Informative References: . . . . . . . . . . . . . . . . . . . .15
10. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . .15
Full Copyright Statement . . . . . . . . . . . . . . . . . . . . . 16
Mallikarjun Chadalapaka Expires August 2003 3
Command Ordering 21-February-03
1. Definitions and Acronyms
1.1 Definitions
- I_T nexus: As per [SAM2], the I_T nexus is a relationship between a
SCSI Initiator Port and a SCSI Target Port. For iSCSI, this relation-
ship is an iSCSI session, defined as a relationship between an iSCSI
Initiator's end of the session (SCSI Initiator Port) and the iSCSI
Target's Portal Group (SCSI Target Port). The I_T nexus can be iden-
tified by the conjunction of the SCSI port names; that is, the I_T
nexus identifier for iSCSI is the tuple (iSCSI Initiator Port Name,
iSCSI Target Port Name).
- PDU (Protocol Data Unit): The initiator and target divide their
communications into messages. The term "iSCSI protocol data unit"
(iSCSI PDU) is used for these messages.
- SCSI Device: This is the SAM-2 term for an entity that contains one
or more SCSI ports that are connected to a service delivery sub-
system and supports a SCSI application protocol. For iSCSI, the SCSI
Device is the component within an iSCSI Node that provides the SCSI
functionality. The SCSI Device Name is defined to be the iSCSI Name
of the node.
- Session: The group of TCP connections that link an initiator with a
target form a session (equivalent to a SCSI I-T nexus). A session may
consist of multiple connections, and TCP connections can be added and
removed dynamically from a session. The multiplicity of connections
at the iSCSI level is completely hidden for the initiator SCSI layer.
Across all connections within a session, a SCSI initiator port sees
one and the same SCSI target port.
Mallikarjun Chadalapaka Expires August 2003 4
Command Ordering 21-February-03
1.2 Acronyms
Acronym Definition
--------------------------------------------------------------
ACA Auto Contingent Allegiance
ASC Additional Sense Code
ASCQ Additional Sense Code Qualifier
CRN Command Reference Number
IETF Internet Engineering Task Force
ITT Initiator Task Tag
LU Logical Unit
LUN Logical Unit Number
NIC Network Interface Card
PDU Protocol Data Unit
TMF Task Management Function
SAM-2 SCSI Architecture Model - 2
SAN Storage Area Network
SCSI Small Computer Systems Interface
TCP Transmission Control Protocol
UA Unit Attention
WG Working Group
Mallikarjun Chadalapaka Expires August 2003 5
Command Ordering 21-February-03
2. Introduction
iSCSI is a SCSI transport protocol designed to enable running SCSI
application protocols on the Internet. Given the size and scope of
Internet, iSCSI thus enables some exciting new SCSI applications.
Potential application areas for exploiting iSCSI's value include -
a) Larger (diameter) Storage Area Networks (SANs) than had
been possible until now.
b) Asynchronous remote mirroring
c) Remote tape vaulting
Each of these applications takes advantage of the practically unlim-
ited distance possible between a SCSI initiator and a SCSI target
that iSCSI allows. In each of these cases, because of the long
delays involved, there is a very high incentive for the initiator to
stream SCSI commands back-to-back without waiting for the SCSI sta-
tus of previous commands. Command streaming may be employed prima-
rily by two classes of applications - while one class may not
particularly care about ordered command execution, the other class
does rely on ordered command execution (i.e. there is an application-
level dependency on the ordering among SCSI commands). As an exam-
ple, cases b) and c) listed earlier clearly require ordered command
execution - a mirroring application may not want the writes to be
committed out of order on the remote SCSI target, so as to preserve
the transactional integrity of the data on that target. To summa-
rize, SCSI command streaming is extremely valuable for a critical
class of applications in long-latency networks when coupled with the
guarantee of ordered command execution on the SCSI target.
This document reviews the various protocol considerations in design-
ing storage solutions that employ SCSI command ordering. This docu-
ment also analyzes and explains the design intent of [iSCSI] with
respect to command ordering.
3. Overview of the iSCSI Protocol
3.1 Protocol mapping description
The iSCSI protocol is a mapping of the SCSI remote procedure invoca-
tion model (see [SAM2]) over the TCP protocol.
Mallikarjun Chadalapaka Expires August 2003 6
Command Ordering 21-February-03
SCSI's notion of a task maps to an iSCSI task. Each iSCSI task is
uniquely identified within that I_T nexus by a 32-bit unique identi-
fier called Initiator Task Tag (ITT). The ITT is both an iSCSI iden-
tifier of the task and a classic SCSI task tag.
SCSI commands from the initiator to the target are carried in iSCSI
requests called SCSI Command PDUs. SCSI status back to the initia-
tor is carried in iSCSI responses called SCSI Response PDUs. SCSI
Data-out from the initiator to the target is carried in SCSI Data-Out
PDUs, and the SCSI Data-in back to the initiator is carried in SCSI
Data-in PDUs.
3.2 The I_T nexus model
In iSCSI, the SCSI I_T nexus model is a virtual abstraction, span-
ning one or more TCP connections. The iSCSI protocol defines the
semantics in order to realize one logical flow of bidirectional com-
munication across multiple TCP connections (as many as 2^16). The
iSCSI connection multiplicity is thus completely contained at the
iSCSI layer, while the SCSI layer is presented with a single I_T
nexus in a multi-connection session. A session between a pair of
given iSCSI nodes is identified by the session identifier (SSID) and
each connection within a given session is uniquely identified by a
connection identifier (CID) in iSCSI.
There are four crucial functional facets of iSCSI that together
present this single logical flow abstraction to the SCSI layer across
multiple iSCSI connections.
a) Ordered command delivery: SCSI commands that are striped
across all the connections in the session get "reassembled"
by the target iSCSI layer based on a Command Sequence Num-
ber (CmdSN) that is unique across the session, so as to make
it appear as if all the commands had travelled in one flow.
b) Connection allegiance: All the PDU exchanges for a SCSI
Command are required to flow on the same iSCSI connection,
up to and including the SCSI Response PDU for the command.
This will again hide the multi-connection nature of a ses-
sion because the initiator SCSI layer will never see the PDU
contents out of order (for ex., status cannot bypass data).
c) Task set management function handling: When all active
tasks in a session are aborted (ABORT TASK SET) or cleared
(CLEAR TASK SET) using SCSI task management functions (TMF),
Mallikarjun Chadalapaka Expires August 2003 7
Command Ordering 21-February-03
[iSCSI] defines an ordered sequence of steps for the target
handling the TMF which guarantees that the TMF Response
arrives after the SCSI Response PDUs of all unaffected tasks
are received on all the connections of the iSCSI session.
This is again intended to preserve the single flow abstrac-
tion to the SCSI layer.
d) Immediate task management function handling: When a task
management function is marked as "immediate" (i.e. only has
a position in the command stream, but did not consume a
CmdSN), [iSCSI] still defines semantics that require the
target iSCSI layer to ensure that the TMF request is exe-
cuted as if the commands and the TMF request were all flow-
ing on a single logical channel. This ensures that the TMF
request will act on tasks that it meant to manage.
The following sections will analyze the "Ordered command delivery"
aspect in more detail, since command ordering is the focus of this
document.
3.3 Ordered command delivery
3.3.1 Issues
There has been a lot of debate on this particular aspect in the IPS
WG. Most of the debate was centered on two specific questions -
a) What should be the required command ordering behavior
required of iSCSI implementations when there are transport
errors (such as TCP checksum failures)?
b) Should [iSCSI] require initiators and targets to enforce
command ordering?
3.3.2 The session guarantee
The final disposition of question a) in section 3.3.1 was reflected
in [RFC3347], "iSCSI MUST specify strictly ordered delivery of SCSI
commands over an iSCSI session between an initiator/target pair, even
in the presence of transport errors.". Stated differently, an iSCSI
digest failure, or an iSCSI connection termination must not cause the
iSCSI layer on a target to allow executing the commands in an order
different from that intended (as indicated by the CmdSN order) by the
initiator. This design choice is enormously helpful in building
storage systems and solutions that can now always assume command
ordering to be a service characteristic of an iSCSI substrate.
Mallikarjun Chadalapaka Expires August 2003 8
Command Ordering 21-February-03
Note that by taking the position that an iSCSI session always guaran-
tees command ordering, [iSCSI] was indirectly implying that the prin-
cipal reason for the multi-connection iSCSI session abstraction was
to allow ordered bandwidth aggregation for an I_T nexus. In deploy-
ment models where this cross-connection ordering mandated by [iSCSI]
is deemed expensive, a serious consideration should be given to
deploying multiple single-connection sessions in stead.
3.3.3 Ordering onus
The final resolution of b) in section 3.3.1 by the iSCSI protocol
designers was in favor of not requiring the initiators to use com-
mand ordering always. This resolution is reflected in dropping the
ACA requirement on the initiators, and allowing ABORT TASK TMF to
plug command holes etc. The net result can be discerned by a care-
ful reader of [iSCSI] - the onus of command ordering is on the iSCSI
targets, while the initiators may or may not use command ordering.
iSCSI targets being the servers in the client-server model, do not
really have a way to establish whether or not the client intends to
take advantage of command ordering service - so the iSCSI targets
simply always provide the guaranteed service. Besides this ratio-
nale, there are inherent SCSI dependencies as we shall see in build-
ing a command ordered solution that are beyond the scope of [iSCSI],
to mandate the usage or otherwise.
3.3.4 Final intent
To summarize the design intent of [iSCSI] -
The service delivery subsystem (see [SAM2]) abstraction pro-
vided by an iSCSI session can be assumed to have the intrinsic
property of ordered delivery of commands under all condi-
tions. This command ordering is across the entire I_T nexus
spanning all the LUs that the nexus is authorized to access. It
is the initiator's discretion to make use of this property.
4. The Command Ordering Scenario
A storage systems designer working with SCSI and iSCSI has to con-
sider the following protocol features in SCSI and iSCSI layers, each
of which has a role to play in realizing the command ordering goal.
Mallikarjun Chadalapaka Expires August 2003 9
Command Ordering 21-February-03
4.1 SCSI layer
The SCSI application layer has several tools to enforce ordering.
4.1.1 Command Reference Number (CRN)
CRN is an ordered sequence number which when enabled for a device
server, increments by one for each I_T_L nexus (see [SAM2]). The one
notable drawback with CRN is that there is no SCSI-generic way (such
as through mode pages) to enable or disable the CRN feature. [SAM2]
also leaves the usage semantics of CRN for the SCSI transport proto-
col, such as iSCSI, to specify. [iSCSI] chose not to support the CRN
feature for various reasons.
4.1.2 Task Attributes
SAM-2 defines the following four task attributes - SIMPLE, ORDERED,
HEAD OF QUEUE, and ACA. Each task to an LU may be assigned an
attribute. [SAM2] defines the ordering constraints that each of
these attributes conveys to the device server that is servicing the
task. In particular, judicious use of ORDERED and SIMPLE attributes
applied to a stream of pipelined commands could convey the precise
execution schema for the commands that the initiator issues, pro-
vided the commands are received in the same order on the target.
4.1.3 Auto Contingent Allegiance (ACA)
ACA is an LU-level condition that is triggered when a command (with
the NACA bit set to 1) completes with CHECK CONDITION and that pre-
vents any commands other than those with the ACA attribute from exe-
cuting until the CLEAR ACA task management function is executed,
while blocking all the other tasks in the task set. See [SAM2] for
the detailed semantics of ACA. Since ACA is closely tied to the
notion of a task set, one would ideally have to select (by setting
the TST bit to 1 in the control mode page of the LU) the scope of the
task set to be per-initiator in order to prevent command failures in
one I_T_L nexus from impacting other I_T_L nexuses through ACA.
4.1.4 UA interlock
When UA interlock is enabled, the logical unit does not clear any
standard unit attention condition reported with autosense and in
addition, establishes a unit attention condition when a task is ter-
minated with one of BUSY, TASK SET FULL, or RESERVATION CONFLICT sta-
Mallikarjun Chadalapaka Expires August 2003 10
Command Ordering 21-February-03
tuses. This so-called "interlocked UA" is cleared only when the
device server executes an explicit REQUEST SENSE ([SPC3]) command
from the same initiator. From a functionality perspective, the scope
of UA interlock today is slightly different from ACA's because it
enforces ordering behavior for completion statuses other than CHECK
CONDITION, but otherwise conceptually has the same design intent as
ACA. On the other hand, ACA is somewhat more sophisticated because
it allows special "cleanup" tasks (ones with ACA attribute) to exe-
cute when ACA is active. One of the principal reasons UA interlock
came into being was that SCSI designers wanted a command ordering
feature without the side effects of using the aforementioned TST bit
in the control mode page.
4.2 iSCSI layer
As noted in section 3.2 and section 3.3, the command ordering that
iSCSI enforces per iSCSI session using the CmdSN is an attribute of
the SCSI transport layer. Note that any command ordering solution
that seeks to realize ordering from the initiator SCSI layer to the
target SCSI layer would be of practical value only when the command
ordering is guaranteed by the SCSI transport layer. In other words,
the related SCSI application layer protocol features such as ACA etc.
are based on the premise of an ordered SCSI transport. Thus iSCSI's
command ordering is the last piece in completing the puzzle of build-
ing solutions that rely on ordered command execution, by providing
the crucial guarantee that all the commands handed to the initiator
iSCSI layer will be transported and handed to the target SCSI layer
in the same order.
5. Connection failure considerations
[iSCSI] mandates that when an iSCSI connection fails, the active
tasks on that connection must be terminated if not recovered within a
certain negotiated time limit. When an iSCSI target does terminate
some subset of tasks, there is a danger that the SCSI layer would
simply move on to the next tasks waiting to be processed and execute
them out-of-order unbeknownst to iSCSI. To preclude this danger,
[iSCSI] further mandates the following -
a) The tasks terminated due to the connection failure must be
internally terminated by the iSCSI target "as if" due to a CHECK
CONDITION. The "as if" is meaningful because this particular com-
pletion status is never communicated back to the initiator, but is
required because if the initiator were using ACA as the command
Mallikarjun Chadalapaka Expires August 2003 11
Command Ordering 21-February-03
ordering mechanism of choice, a SCSI-level ACA will be triggered
due to this mandatory CHECK CONDITION. This addresses the afore-
mentioned danger.
b) After the tasks are terminated due to the connection failure,
the iSCSI target must report a unit attention condition on the
next command processed on any connection for each affected I_T_L
nexus of that session. This is required because if the initiator
were using UA interlock as the command ordering mechanism of
choice, a SCSI-level UA will trigger a UA-interlock. This again
addresses the aforementioned danger. iSCSI targets must report
this UA with the status of CHECK CONDITION, and the ASC/ASCQ value
of 47h/7Fh ("SOME COMMANDS CLEARED BY ISCSI PROTOCOL EVENT").
6. Implementation considerations
In general, command ordering is automatically enforced if targets and
initiators comply with the iSCSI specification. However, here are
certain things for the iSCSI initiators and targets to take note of.
a) iSCSI initiators may proactively seek to preclude scenarios
that would normally lead to out-of-order command execution even
when they have designed their systems never to execute commands
out of intended order. This is simply because the SCSI command
ordering features such as UA interlock are likely to be costlier
in performance when they are allowed to be triggered. [iSCSI] pro-
vides enough guidance on how to implement this proactive detec-
tion of transport errors.
b) The whole notion of command streaming does of course assume
that the target in question supports command queueing. An iSCSI
target desirous of supporting command ordering solutions should
ensure that the SCSI layer on the target supports command queu-
ing. Especially the remote backup (tape vaulting) applications
that iSCSI enables make a compelling case that tape devices must
also start supporting command queuing.
c) An iSCSI target desirous of supporting high-performance com-
mand ordering solutions that involve specifying a description of
execution schema should ensure that the SCSI layer on the target
in fact does support the ORDERED and SIMPLE task attributes.
d) There is some consideration of expanding the scope of UA
interlock to encompass CHECK CONDITION status and thus make it the
only required command ordering functionality of implementations to
build command ordering solutions. Until this is resolved in T10,
the currently defined semantics of UA interlock and ACA warrant
Mallikarjun Chadalapaka Expires August 2003 12
Command Ordering 21-February-03
implementing both features by iSCSI targets desirous of support-
ing command ordering solutions.
Mallikarjun Chadalapaka Expires August 2003 13
Command Ordering 21-February-03
7. IANA Considerations
This document does not have any IANA considerations.
8. Security Considerations
This document does not have any security considerations.
Mallikarjun Chadalapaka Expires August 2003 14
Command Ordering 21-February-03
9. References and Bibliography
9.1 Normative References
[iSCSI] J. Satran et. al. draft-ietf-ips-iscsi-20.txt (work in
progress)
[RFC790] J. Postel, ASSIGNED NUMBERS, September 1981.
[RFC793] TRANSMISSION CONTROL PROTOCOL, DARPA INTERNET PROGRAM
PROTOCOL SPECIFICATION, September 1981.
[RFC2026] Bradner, S., "The Internet Standards Process -- Revi-
sion 3", RFC 2026, October 1996.
[RFC2119] Bradner, S. "Key Words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2434] T. Narten, and H. Avestrand, "Guidelines for Writing
an IANA Considerations Section in RFCs.", RFC2434, October
1998.
[SAM] ANSI X3.270-1998, SCSI-3 Architecture Model (SAM).
[SAM2] T10/1157D, SCSI Architecture Model - 2 (SAM-2).
[SBC] NCITS.306-1998, SCSI-3 Block Commands (SBC).
[SPC3]T10/1416-D, SCSI Primary Commands-3.
9.2 Informative References:
[RFC3347] M. Krueger et. al., "iSCSI Requirements and Design
Considerations"
10. Authors' Addresses
Mallikarjun Chadalapaka
Hewlett-Packard Company
8000 Foothills Blvd.
Roseville, CA 95747-5668, USA
Phone: +1.916.785.5621
E-mail: cbm@rose.hp.com
Rob Elliott
Hewlett-Packard Company
MC 150801
PO Box 692000
Houston, TX 77269-2000 USA
Phone: +1.281.518.5037
E-mail: elliott@hp.com
Comments may be sent to Mallikarjun Chadalapaka.
Mallikarjun Chadalapaka Expires August 2003 15
Command Ordering 21-February-03
Full Copyright Statement
"Copyright (C) The Internet Society (date). All Rights Reserved. This
document and translations of it may be copied and furnished to oth-
ers, and derivative works that comment on or otherwise explain it or
assist in its implementation may be prepared, copied, published and
distributed, in whole or in part, without restriction of any kind,
provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this docu-
ment itself may not be modified in any way, such as by removing the
copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of develop-
ing Internet standards in which case the procedures for copyrights
defined in the Internet Standards process must be followed, or as
required to translate it into languages other than English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."
The IETF has been notified of intellectual property rights claimed in
regard to some or all of the specification contained in this docu-
ment. For more information consult the online list of claimed rights.
Mallikarjun Chadalapaka Expires August 2003 16