Internet DRAFT - draft-codogno-mime-nntp8bit
draft-codogno-mime-nntp8bit
INTERNET-DRAFT Maurizio Codogno
draft-codogno-mime-nntp8bit-00.txt CSELT
Expires: February 11, 1999 Date: August 06, 1998
The MIME application/nntp8bit Content-type
Status of this Memo
This document is an Internet Draft; Internet Drafts are working
documents of the Internet Engineering Task Force (IETF) its Areas,
and Working Groups. Note that other groups may also distribute
working documents as Internet Drafts.
Internet Drafts are draft documents valid for a maximum of six
months. They may be updated, replaced, or obsoleted by other
documents at any time. It is not appropriate to use Internet Drafts
as reference material or to cite them other than as a "working draft"
or "work in progress".
Please check the abstract listing in each Internet Draft directory
for the current status of this or any other Internet Draft.
To view the entire list of current Internet-Drafts, please check
the "1id-abstracts.txt" listing contained in the Internet-Drafts
Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
(Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au
(Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu
(US West Coast).
Abstract
The application/nntp8bit content-type is proposed and defined as an
efficient and simple way to transmit raw ("binary") data over an NNTP
connection, taking into account the foreseeable limitations of that
standard.
1. Introduction
Usenet News [NNTP, NEWS] are a very popular data transmission format:
at the time of writing, there are tens of thousands of different
discussion groups, and the traffic generated per site could be as
much as 10 GB/day.
The vast majority of the data is composed by binary files (images,
audio or video clips, software programs...) which comprise up to 90%
of the global traffic. Unfortunately, the two main ways used to codify
binary data, that is UUENCODE and MIME application/octet-stream with
Content-Transfer-Encoding base64, add a 33% overhead on the dimension
of the file sent.
The new specifics of the NNTP protocol which are worked up now
[NEWNNTP] require an 8-bit-wide channel, and the companion new
definition for Usenet Message Format [USEFOR] does not object to the
presence of 8-bit data. There is however a problem, which does not
Codogno Expires February 1999 [Page 1]
Internet Draft application/nntp8bit August, 1998
alloy to send raw data directly: it is not possible to have in the
body of an article an ASCII NUL (0x00) character, and ASCII CR and LF
(0x0d, 0x0a) must appear together. Moreover, each line in the body
must be at most 998 octets long, and must end with the CR-LF
sequence (not counted in the 998 octets limit).
A rather simple way to cope with these limitation is to develop a
MIME Content Type which codes the text in such a way to comply with
this. This solution has been preferred to the definition of a new
Content Transfer Encoding because it is simple to have the former
working: if a newsreader does not understand the format, it is
possible to save the article and process it with an external filter.
2. application/nntp8bit Registration Information
The following form is copied from RFC 1590, Appendix A: registration
of the new media type will be duly performed.
To: IANA@isi.edu
Subject: Registration of new Media Type content-type/subtype
Media Type name: application
Media subtype name: nntp8bit
Required parameters: Type, a media type/subtype
Optional parameters: Name, the name of the file
Encoding considerations: it must be encoded "8bit" or "binary".
Security considerations: NONE
Published specification: RFC-REL (this document).
Person & email address to contact for further information:
Maurizio Codogno
CSELT CF/IM Dept.
Via G. Reiss Romoli, 274
I-10148 Torino TO
Italy
+39 011 228 6132
<mau@beatles.cselt.it>
3. Definition of the coding
Since it is expected that, at least in the beginning, the MIME type
application/nntp8bit would not be commonly deployed, the
specification of the coding has deliberately kept simple. Moreover,
it can be supposed that most binary files sent by Usenet News are
already compressed: therefore, it was thought that it is simple
just to escape offending characters. A single exception has been
Codogno Expires February 1999 [Page 2]
Internet Draft application/nntp8bit August, 1998
made: since there may be the case that someone sends uncompressed
files, and it seems that they contain a large amount of NUL
characters, NUL is coded with a single octet.
Since no chunk of data between CRLF pairs can be longer than 998
octets, it is also necessary to add CRLF pairs in suitable places.
The coding algorithm, written in pseudo-C, runs as follow:
----------------- cut ----------------------
int nchar=0;
char c, NUL=0x00, CR=0x0d, LF=0x0a;
char X80=0x80, X81=0x81, X8A=0x8a, X8D=0x8d;
while ((c=getchar()) != EndOfFile) {
if (c == NUL)
{ printf("%c",X80); nchar++; }
else if (c == CR)
{ printf("%c%c",X81,X8D); nchar+=2; }
else if (c == LF)
{ printf("%c%c",X81,X8A); nchar+=2; }
else if (c == X80)
{ printf("%c%c",X81,X80); nchar+=2; }
else if (c == X81)
{ printf("%c%c",X81,X80); nchar+=2; }
else
{ printf("%c",c); nchar++; }
if (nchar >= 997)
{ printf("%c%c",CR,LF); nchar=0; }
}
----------------- cut ----------------------
while the uncoding algorithm is the following:
----------------- cut ----------------------
char c, NUL=0x00, CR=0x0d, LF=0x0a;
char X80=0x80, X81=0x81, X8A=0x8a, X8D=0x8d;
while ((c=getchar()) != EndOfFile) {
if (c == CR)
c=getchar(); /* eat CRLF */
else if (c == X80)
printf("%c",NUL);
else if (c == X81) {
c=getchar(); /* get escaped char */
if (c == X80) printf("%c",X80);
else if (c == X81) printf("%c",X81);
else if (c == X8A) printf("%c",LF);
else if (c == X8D) printf("%c",CR);
}
else
printf("%c",c);
}
----------------- cut ----------------------
Codogno Expires February 1999 [Page 3]
Internet Draft application/nntp8bit August, 1998
Note that a real implementation should of course check for malformed
input data, and return correspondingly an error message.
The overhead induced by this coding can be roughly measured as
follows:
- four octets out of 256 are coded with two octects, increasing
the total dimension by 1.6% on average;
- there are two extra octets each 997 or 998, adding a further 0.2%;
- there is the MIME header overhead, which is negligible for large
files.
It is therefore possible to code a typical article with just 2%
overhead, rather than the 33% of UUENCODE or base64 encoding.
4. User Agent Requirements
User agents that do not recognize application/nntp8bit shall, in
accordance with [MIME], treat the entire entity as
application/octet-stream. This is ok, since the data may then be
saved as an external file which can be processed offline.
MIME User Agents that recognize application/nntp8bit will decode the
stream of data and present it to the user as a file with content
defined in the Type parameter.
4.1 Recursion
MIME is a recursive structure. Hence one must expect an
application/nntp8bit entity to contain other application/nntp8bit
entities. When a application/nntp8bit entity is being processed for
display or storage, any enclosed application/nntp8bit entities shall
be processed as though they were being stored.
5. Further work
It could be possible to define a way to process articles split before
transmission, because of their large size. Two possible ways to do
this are
- add a MIME optional parameter which says which part of the file is
being sent
- use an escape sequence "0x81 0xnn", with nn going from 01 to 79, at
the beginning of the stream data to indicate which part is being
sent.
The latter system limits the dimension of the complete file being
sent, but it is more compact.
Codogno Expires February 1999 [Page 4]
Internet Draft application/nntp8bit August, 1998
6. Security considerations
It may be possible to prepare a coded stream which can execute
malicious programs, if a newsreader cannot understand this MIME Media
Type. It has however to be noted that the specifications for Usenet
message would allow such a message anyway, so no new security issue
should be added.
7. Acknowledgments
[I hope someone in the USEFOR IETF group will help me!]
The author, however, take full responsibility for all errors
contained in this document.
8. References
[MIME] Borenstein, N. and Freed, N., "MIME (Multipurpose Internet
Mail Extensions): Mechanisms for Specifying and Describing
the Format of Internet Message Bodies", June 1992, RFC 1341.
[NEWS] Horton, M., Adams, R., "Standard for Interchange of USENET
Messages", December 1987, AT&T Bell Labs and Center for
Seismic Studies, RFC 1036.
[NEWNNTP] Barber, S. "Network News Transport Protocol", work in
progress, ftp://ds.internic.net/internet-drafts/draft-ietf-
nntpext-base-04.txt
[NNTP] Kantor, B., Lapsley, P., "Network News Transfer Protocol",
February 1986, U.C. San Diego and U.C. Berkeley, RFC 977.
[USEFOR] Ritter, D., N., "User Article Format", work in progress,
ftp://ds.internic.net/internet-drafts/draft-ietf-usefor-
article-01.txt
9. Author's address
Maurizio Codogno
CSELT CF/IM Dept.
Via G. Reiss Romoli, 274
I-10148 Torino TO
Italy
+39 011 228 6132
<mau@beatles.cselt.it>
Codogno Expires February 1999 [Page 5]