Internet DRAFT - draft-cavanna-iscsi-crc-vs-cksum
draft-cavanna-iscsi-crc-vs-cksum
Internet Draft V. Cavanna
<draft-cavanna-iscsi-crc-vs-cksum-01.txt> Agilent Technologies
Expires September 2001 March 2, 2001
iSCSI Digests û CRC or Checksum?
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet- Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Copyright Notice
Copyright (C) Agilent Technologies (2001). All Rights Reserved.
1. Abstract
The iSCSI working group is currently considering the method of
protecting iSCSI messages from errors. This I-D presents a proposal
to use the Fletcher-32 checksum algorithm or some variant of it
rather than Adler-32; if we must use CRC, to make it CRC32 (or
CRC32Q) rather than CRC64.
2. Introduction
The iSCSI I-D proposes using a 64 bit CRC to protect the header and
payload of the iSCSI message. The overhead and complexity of a 64
bit CRC seems inappropriate for the type of errors from which the
iSCSI message needs protection. In fact a 32-bit checksum seems
adequate for our purposes.
This document proposes using a 32 bit Fletcher-32 checksum rather
than a 64 bit CRC and offers some justification.
Cavanna [Page 1]
Internet-Draft iSCSI û CRC or Checksum March 2001
3. Brief description of the algorithms CRC-32, Fletcher-32, Adler-32
3.1. CRC-32 (of Ethernet and FDDI fame)
CRC-32 treats the data as binary coefficients of a polynomial in x
and divides, using mod2 division, by the CRC polynomial P(x) (after
multiplication by x^32 to make room for adding CRC bits to the low
order bits of the shifted data without affecting the data). The
coefficients of the remainder polynomial are then used as the CRC
bits. The CRC register is initialized to a value other than zero to
detect extra or missing leading zeroes. The CRC field is actually
complemented before being transmitted in order to detect extra
trailing zeroes in the case where the transmitted CRC would
otherwise be zero.
3.2. Fletcher-32 checksum
Fletcher-32 checksum has two sums, S1 and S2 which are initially
zero. The sums can be ones complement (modulo 2^16-1) or twos
complement (modulo 2^16). S1 is a 16 bit sum of the data taken 2
bytes at a time. S2 is a 16 bit sum of the data multiplied by the
position of the data from the end of the packet. No multiplication
is actually required in the algorithm. The multiplication effect
results from the way the sum is accumulated. S2 accumulates values
of S1 after S1 is updated; so a given data word appears multiple
times in S2. The number of times a word appears in S2 depends on its
position from the end of the packet.
3.3. Adler-32 checksum
Adler-32 checksum has two sums S1 and S2. S1 is a 16 bit sum of the
data taken one byte at a time and modulo 65521. S2 is a 16 bit sum,
modulo 65521, of the data taken one byte at a time multiplied by its
position from the end of the packet, and the sequence length. The
sequence length enters the picture because of the initial value of 1
for S1. No multiplication is actually required in the algorithm. The
multiplicative effect results from the way the sum is accumulated as
described above in section 3.2. Adler-32 is essentially the same as
Fletcher-32 except S1 is initialized to 1 instead of 0 and the sum
is modulo 65521. Another difference which has implications on the
burst detection capability of Adler-32, as I explain later, is that
Adler-32 is defined to sum data one byte at a time whereas Fletcher-
32 is defined to handle data 2 bytes at a time.
Cavanna [Page 2]
Internet-Draft iSCSI û CRC or Checksum March 2001
4. What a CRC-n polynomial, P(x), can protect from?
Assuming n check bits, we can claim the following error detection
properties without placing much restrictions on the form of P(x).
4.1. Individual bit errors
o All single bit errors provided p(x) has two or more terms and
regardless of record length.
o All double bit errors provided p(x) has a factor with 3 or more
terms and record length less than the period of p(x).
o The period of P(x) depends on its form (specifically the cycle
length of the primitive polynomial that is usually one of the
factors) and for most standard polynomials is within a factor of 4
of 2^n-1. Even for n=32 this number is much larger than the record
length (4.3 billion bits).
o All triple bit errors and odd number of errors provided p(x)
contains the factor x^c+1 where c is an integer and regardless of
record length. Most CRC polynomials (but not CRC-32) contain the
factor x+1. This makes the hamming distance 4!
o Most occurrences of more than three single bit errors regardless
of record length.
Thus most CRCs, with few restrictions on their form, will detect all
possible 3 bit errors (Hamming distance is 4) in practical record
lengths, and most occurrences of more than three single bit errors.
CRC-32 used in Ethernet and FDDI and that I refer to in this
document is prime and does not have x+1 as a factor. The implication
is that it does not detect ALL odd number of errors (in particular
3) regardless of record length.
Cavanna [Page 3]
Internet-Draft iSCSI û CRC or Checksum March 2001
4.2. Burst errors
o All single bursts of length up to and including n, regardless of
record length, provided P(x) has an x^0 term. A burst is a range
of bits in which at least the first and last bit are in error. The
size of the burst is the number of bits including the first and
last bits.
o 1-1/2^(n-1) * 100 % of single bursts of length n+1 if all errors
patterns are equally probable. The only pattern that is undetected
is that which corresponds to the coefficients of P(x).
o 1-1/2^(n) * 100 % of single bursts of length greater than n+1 if
all errors patterns are equally probable. For n=32 this is
99.9999999767% of all single bursts.
o If P(x) has (x^c+1) as a factor it will also detect double bursts
such that the length of the shorter burst does not exceed the
degree of p(x) and the sum of the lengths of the two bursts does
not exceed c+1, provided the record length including check bits
does not exceed the period of p(x). A double burst consists of two
bursts separated by an arbitrary number of bits.
o Most of remaining bursts.
By picking a polynomial with a factor x+1 we guarantee a hamming
distance of 4 (without this term hamming distance is at most 3 as
long as P(x) has two or more terms and a factor with 3 or more
terms). By picking the other factor to be a primitive poly we
maximize the cycle length and thus increase our coverage of errors
for which the coverage depends on cycle length, such as for double
bit errors and for burst detection. Burst detection allows more bits
to change than the hamming distance would imply but the bits must be
close together.
If all error patterns are equally probable the form of the
polynomial is unimportant and only the number of check bits matters.
Cavanna [Page 4]
Internet-Draft iSCSI û CRC or Checksum March 2001
5. CRC-64 is overkill
Protection length increases to 2^64 or 1.8 x10^19 bits. This
increased protection length is not needed for iSCSI. The fraction of
errors that are detected when all errors are equally probable is 1-
1/2^64. When all errors are equally probable the fraction detected
by CRC-32, 1-1/2^32 or 99.9999999767%, is large enough. The size of
single burst that is always detected is 64 whereas for CRC-32 it is
32 that is, CRC-64 will catch ALL single bursts of size 64 whereas
CRC32 will only catch MOST such bursts. Many more errors are, of
course, detected by CRC64 than by CRC32. Note that a checksum with
64 bits will detect the same fraction as CRC-64 ,1-1/2^64 û when all
errors are equally probable [ref3].
Since we want to detect errors in buses that are wider than 64 bits
the burst protection of CRC-64 is not sufficient to guarantee 100%
detection.
6. What Fletcher-32/Adler-32 Checksum can protect from?
Assume 2s complement sum. We can claim the following protection [ref
1]:
o All single bit errors.
o For Fletcher-32 and for Adler-32 modified to handle 2 bytes at
once, all single bursts of less than 31 bits (possibly 32 for
Adler-32).
o Most other errors.
o 1-1/2^32*100 % of all errors when all errors are equiprobable.
This is the same as CRC-32 [ref 3]
o Adler-32 checks the sequence length so a separate check is not
necessary.
Cavanna [Page 5]
Internet-Draft iSCSI û CRC or Checksum March 2001
7. Adler vs. Fletcher for iSCSI.
The only comparison (RFC-1950) I have seen between Fletcher and
Adler involved Fletcher-16 (ones complement version) vs. Adler-32
and Adler-32 seemed better but, for the type of errors we are trying
to protect from, is primarily due to the increase in the number of
check bits rather than the change in algorithm û my opinion. If we
compare Adler-32 to 32 bit Fletcher there are only two differences,
both of which are minor in the context of error protection for
iSCSI. Adler-32 differs in:
o Initial value for S1 is 1 to make the length of the sequence part
of S2 so extra or missing bytes with a value of zero will produce
a different checksum. Since the length of the data record is
easily verifiable by other means this is not a significant
benefit.
o Modulo is 65521 (a prime number) to avoid (according to RFC 1950)
a possible large class of two-byte errors from going undetected.
RFC 1950 was referring to ones complement Fletcher-16 which has
the blind spot for 16 bit bursts due to its two representations
for zero which produce the same checksum. Ref 1 also claims that
Fletcher-16 guarantees detection of only 15 bits. My belief is
that Fletcher-32 detects all bursts of 31 bits and may have a
similar blind spot as Fletcher-16, but for 4-byte errors.
Presumably Adler-32, which handles one data byte at a time,
guarantees detection of all 2-byte bursts but I am not sure.
Presumably Adler-32 if modified to handle two bytes at a time
would catch all bursts of 32 bits. Again I am not positive of this
but will confirm and report my conclusion.
Cavanna [Page 6]
Internet-Draft iSCSI û CRC or Checksum March 2001
8. Example of errors that are not detected by Adler-32, Fletcher-32,
CRC-32.
8.1. [Ref2] shows some patterns that Fletcher and Adler will not detect
but that CRC-32 will detect by virtue of its single burst detection
capability. Adler-32 will not detect a change in 3 consecutive bytes
from x,y,z to xÆ,yÆ,zÆ when the bytes are related as follows:
zÆ-z= -(xÆ-x)-(yÆ-y) and 2xÆ+yÆ=2x+y
For example the 3 byte sequence consisting of 4,2,1 is changed to
5,0,2 and the checksum is the same.
This is only a 3 byte burst. Why is it not detected by Adler-32 when
Fletcher-32 detects all bursts of 31? Because Adler-32 has been
defined to handle one byte of data at a time. In this respect Adler-
32 is inferior to Fletcher-32 which is defined to handle 2-bytes of
data at a time. I suppose Adler-32 can easily be modified to handle
2 bytes of data at a time.
Note that this change involves 5 bits and requires invocation of the
burst detection (since the bits are close together) properties of
the Adler-32 checksum as it would of CRC-32 which has a hamming
distance as low as 4.
8.2. CRC-32 will not detect any error pattern consisting of its
coefficients nor any multiple of it (sum of shifted versions). The
shortest such pattern is 33 bits.
I should point out that the patterns that get by Adler-32 may be
easily expressed in a short equation but are not necessarily more
likely to occur than the patterns that get through the CRC-32.
Cavanna [Page 7]
Internet-Draft iSCSI û CRC or Checksum March 2001
9. Classes of errors to protect against
Note that iSCSI is not the primary level of protection. In
particular we do not need to protect from link type of errors.
o Errors introduced at layers below iSCSI by sending or receiving
node.
o Software errors (buffer mismanagement).
o DMA hardware that fails to DMA all the data or DMAs more data than
it should, or corrupts the data.
o Errors introduced by middle boxes that recomputed checksums for
layers below iSCSI after introducing intentional and perhaps
unintentional changes.
o Unintentional changes in middle boxes and end nodes
o Memory errors (random or pattern dependent)
o Bus errors (widths: 16,32,64,128)
o Fifo overruns and underruns
o Summarizing probable error patterns
o Missing data in chunks of 16, 32, 64, 128 bits
o Extra data in chunks of 16, 32, 64, 128 bits
o Changed data in chunks of 1, 16, 32, 64, 128 bits
o Periodic single bit errors every 16 or 32 or 64 or 128 bits.
Cavanna [Page 8]
Internet-Draft iSCSI û CRC or Checksum March 2001
10. How good is our coverage of such errors?
o When a single chunk is modified a 32 bit checksum and a 32 bit CRC
both provide complete coverage for chunk sizes less than 32 bits.
CRC-32 provides complete coverage for chunk sizes of 32.
o CRC-64 provides complete coverage up to chunk size of 64 bits.
o None provide complete coverage for chunk sizes of greater than 64
bits.
o For single bit errors all provide complete coverage.
o CRC-32 provides complete coverage for 2 and good coverage for 3
single bit errors.
11. Comparing CRC with Checksum
o CRC algorithm uses division rather than addition.
o A small change in input affects many bits of the CRC.
o The number of bits is most important.
o The form is not so important for the types of errors that we need
to detect
o CRCs are more sensitive to the bursty nature of communication line
noise and will detect ALL bursts shorter than the size of the CRC.
This is not the job of the iSCSI CRC but of the frame level CRC.
The ones complement sum is not insensitive to these error
patterns, it is just not especially sensitive to them. The extra
sensitivity of a CRC to burst errors comes at the expense of
lessened sensitivity to other bit pattern errors since the total
fraction of errors detected remains the same. See Ref [3]
o CRC is more likely to detect error patterns displaying regularity
such as periodic single bit errors but this comes at the price of
decreased sensitivity to other types of errors since the total
fraction of error patterns detected, when all are equiprobable,
for a given number of check bits is the same for CRC or a
checksum.
o For small enough number of errors CRC is guaranteed to catch ALL
patterns as explained in a previous section and thus, if error
patterns with small number of errors are more common, then CRC
provides better error coverage. Again, this comes at the expense
of decreased sensitivity to other error patterns although such
patterns tend to be irregular and presumably less common.
Cavanna [Page 9]
Internet-Draft iSCSI û CRC or Checksum March 2001
o CHECKSUM, by contrast has more holes for error patterns with small
number of errors, but CHECKSUM compensates by catching errors that
CRC will miss. Again, the total fraction of error patterns
detected only depends on the number of check bits [ref3].
o In a physical link where the primary source of errors is random
noise it is very attractive to use an error protection scheme
which guarantees complete protection for the most common errors
which are single, double and triple individual bits or bursts
(because of group encoding). For example if the bit error rate is
low enough that the probability of 3 or more bits in error within
a data record is so low that, at the link bit rate, the
probability of 3 or more bits is negligible, then the CRC offers
practically complete protection. However, to reiterate, it is the
function of the link CRC to provide this protection, not the iSCSI
message check sequence.
o For group-encoded data, detecting all occurrences of 2 bits or
less implies detecting all occurrences of two bursts of size that
depends on the group encoding. For example for 8B:10B encoding,
the bursts are 8 bits wide. CRCs can be easily designed to
guarantee 100% protection from 2 bursts of 8 bits or less
occurring anywhere within a record (of practical length). In fact
CRC-32 will catch all double bursts of size 9 for code length up
to 13000 [ref4]. This is not as easy to implement with checksums.
o We need to characterize the types of errors that we need to
protect from and confirm that CHECKSUM provides sufficient
protection.
12. Checksum partials easier to handle
When an iSCSI message is composed of multiple blocks that arrive at
different time and perhaps out of sequence, the receiver needs to
accumulate partial digests (CRC or checksum) from the blocks.
Combining the partial digests appears to be significantly cheaper
and faster for checksums than for CRCs.
Cavanna [Page 10]
Internet-Draft iSCSI û CRC or Checksum March 2001
13. Recommendation
Fletcher-32 is my first choice. It is cheaper to implement than CRC-
32 and probably provides adequate protection from the expected
errors. Unless Adler-32 is modified to handle 2 bytes at a time
Fletcher-32 has superior single burst detection performance. There
is not sufficient benefit in Adler-32 to justify the additional
complexity of the mod 65521 arithmetic.
CRC-32 is my second choice. There is not sufficient benefit in CRC-
64 to justify the additional complexity and overhead.
Cavanna [Page 11]
Internet-Draft iSCSI û CRC or Checksum March 2001
14. References
[1] Performance of Checksums and CRCÆs over Real Data by Jonathan
Stone, Michael Greenwald, Craig Partridge. IEEE/ACM Transactions on
Networking, Vol. 6, no. 5, October 1998
[2] On Fletcher and Adler codes, and classic CRCs. email from Dr.
Dafna Sheinwald from the IBM Haifa Research Lab to iSCSI group via
Julian Satran of IBM circa 1/25/01.
[3] FITS Checksum Proposal by R.L. Seaman and W.D. Pence 24 August
1995.
http://rosat.gsfc.nasa.gov/docs/heasarc/ofwg/docs/general/checksum/n
ode18.html
[4] Toru Fujiwara, Tadao Kasami, Shu Lin. Error Detecting
Capabilities of the Shortened Hamming codes adapted for error
detection in IEEE standard 802.3. IEEE Transactions on
Communications, vol. 37, no.9, 1989, pp. 986-989
15. Authors' Addresses
Vicente Cavanna
1101 Creekside Ridge Drive Suite 100
Roseville, CA 95661-9051
USA
Phone: 916-788-5797
Email: vince_cavanna@agilent.com
Matt Wakeley
1101 Creekside Ridge Drive Suite 100
Roseville, CA 95661-9051
USA
Phone: 916-788-5670
Email: matt_wakeley@agilent.com
Expires September 2001
Cavanna [Page 12]