Internet DRAFT - draft-burri-irc-continuation-message-lines

draft-burri-irc-continuation-message-lines



Internet Draft                                            	 C. Burri
                                                       Synecta Informatik
                                                         Expires Jan 2002


                  Handling IRC continuation message lines
              draft-burri-irc-continuation-message-lines-00.txt


Status of this Memo

   This document is an Internet-Draft and is subject to
   all provisions of Section 10 of RFC2026.
 
   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other
   documents at any time.  It is inappropriate to use Internet-
   Drafts as reference material or to cite them other than as
   "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html 


Abstract

   Due to the way the IRC protocol is implemented, it may occur that a
   server sends incomplete messages to a client, so called continuation
   message lines.

   There seems to exist confusion about how to handle continuation
   message lines; many implementations are broken and do not respect
   them at all. Others rely on timers to complete continuation lines,
   which is not recommended due to the asyncronous nature of IRC
   communications.

   This Memo proposes an algorithm to handle continuation lines received
   from an IRC connection in such way that no timers are needed, and is
   intended as a supplement to the existing RFC 1459 which describes the
   Internet Relay Chat protocol.


Copyright Notice

   Copyright (C) The Internet Society (2001). All Rights Reserved.




Burri                                                            [page 1]

Internet Draft   Handling IRC continuation message lines         Jul 2001


Table of Contents
										
   1. Introduction......................................................2
   2. Handling continuation message lines...............................3
      2.1. IRC Message format...........................................3
      2.2. Discovery....................................................4
      2.3. Reassembly...................................................4
   3. Credits and authors' adress.......................................5



1. Introduction
   
   Current IRC implementations utilize input and output buffers for async
   network IO, whereas the input buffers are always processed first. All
   output gets stacked in the send queue, and is not sent to the client
   until processing of the input buffer has completed. This process helps
   TCP build larger packets, as possibily multiple messages are bundled
   into one network transmission (TCP segment). For more information
   consult RFC 1459, Sections 8.2, 8.3

   The same process can however lead to incomplete messages, which are
   cropped due to TCP limitations, namely the TCP window size. Such
   lines appear incomplete to the client, which does not normally cause
   any problems by itself. However, the line that follows the truncated
   line will be incomplete too, only containing data that did not fit
   within the last TCP segment. This line is special in such way that,
   if it is treatened like a normal line, then this might lead to
   arbitrary data being parsed as a complete message from the server.

   This circumstance has been observed to cause problems in various IRC
   clients. Most of them seem to completely ignore the existence of the
   problem, which may possibly result in severe brain damage, or even
   loss of chanop status incase the broken implementation is being used
   in a bot that maintains an IRC channel, since it is not clearly
   defined what happens when a continuation message line is received.
   This undefined behaviour could possibly be exploited, by trying to
   make the implementation believe that it received a message from
   somewhere, where infact the true origin is spoofed.	

   The algorithm proposed in this Memo has been designed to reassemble
   continuation message lines before processing them in the message
   parser. It does this without the use of any timers or delays, which
   could lead to loss of data, incase the reassembly timeout has been
   set to a low value; or to undesirable high delays in reading from
   the network, incase the reassembly timeout has been choosen too high.








Burri                                                            [page 2]

Internet Draft   Handling IRC continuation message lines         Jul 2001


   The proposed algorithm relies on the facts that:

            - for any incomplete message line, there will be a 
              resulting completion message line received in the next
              TCP segment that is read from the network.

            - the transmission of the completion message line following
              the incomplete line will always occur before any other
              network transmission occurs, or in other words, the
              completion message line will always be the first line of
              the next delivered TCP segment.
      
   The proposed algorithm does not depend on any particular programming
   language. Instead, it is designed to work with every programming
   language that has provides buffers (variables) and comparision tests.

   It can be implemented as a preprocessor that is located infront of the
   IRC message parser, in the data stream.

   It seems further notable to the author that the proposed algorithm is
   Public Domain property and may be freely used and implemented without
   paying any fee for whatsoever to anyone.


2. Handling continuation message lines

   In order to reassemble continuation message lines, they must be
   detected in a reliable way. After detecting, they need to be marked as
   incomplete, and stored in a temporary buffer, for later reassembly.


2.1. IRC Message Format

   IRC RFC 1459 defines the message format as follows:

   <message>  ::= [':' <prefix> <SPACE> ] <command> <params> <crlf>
   <prefix>   ::= <servername> | <nick> [ '!' <user> ] [ '@' <host> ]
   <command>  ::= <letter> { <letter> } | <number> <number> <number>
   <SPACE>    ::= ' ' { ' ' }
   <params>   ::= <SPACE> [ ':' <trailing> | <middle> <params> ]
   <middle>   ::= <Any *non-empty* sequence of octets not including
                   SPACE or NUL or CR or LF, the first of which may
                   not be ':'>
   <trailing> ::= <Any, possibly *empty*, sequence of octets not
                   including NUL or CR or LF>

   <crlf>     ::= CR LF

   As we can see from the above representation, every complete IRC
   message must end in the sequence <crlf>. Section 8 of RFC 1459 also
   reports the usage of either CR *or* LF as message delimiter.



Burri                                                            [page 3]

Internet Draft   Handling IRC continuation message lines         Jul 2001


   It might be a good idea for any implementation to accept all three
   variants; for the sake of simplicity we will however refer to CRLF
   as the message delimiter in this document.


2.2. Discovery

   Once we know how IRC messages are delimited, we can check any IRC
   message line for completeness. Any complete line must end in the line
   delimiter sequence. If a given IRC message line does not end in that
   sequence, then it must have been truncated.

   To speed up performance, the proposed discovery algorithm performs
   the end delimiter test on each received TCP segment, instead of each
   received line, since each received TCP segment must also end in a
   CRLF sequence if the last contained line has not been truncated by the
   sending TCP.

   Notice that we are not looking for continuation lines, since the
   algorithm cannot recognize a continuation line by itself. That is
   impossible to do because of the arbitrary structure of that line. The
   line might infact be composed of data that has been sent to either a
   channel or to the client via PRIVMSG or other means.

   The solution to this problem is to recognize the lines that have
   been truncated, and store them for reassembly, instead of parsing
   them. The discovery of a truncated line is also to be used to
   recognize the following line as the continuation message line.

   Thus, the discovery algorithm sets a flag, whenever a TCP segment,
   that containis a truncated line, is received.


2.3. Reassembly
	
   After recieving a TCP segment and discovering incomplete lines, the
   preprocessor checks a buffer (which holds data if the previous segment
   did contain a truncated line; described below) for the presence of any
   data. If there is data in the buffer, then the preprocessor does
   append the received data to the data in the buffer (this will
   effectively reassemble the truncated- and the continuation line),
   cycles the buffer (so it is empty after reassembly took place) and
   passes the resulting data to the next step.

   The next step parses the received data segment into IRC messages by
   splitting the data on each occurence of the delimiter sequence CRLF.

   If the recieved, tested, possibly reassembled and then splitted
   message did not contain an incomplete last line (the discovery flag
   was not set), then the preprocessor calls the message parser normally
   for each received line.



Burri                                                            [page 4]

Internet Draft   Handling IRC continuation message lines         Jul 2001


   If the received and splitted message has been marked as incomplete
   (discovery flag set), then the preprocessor calls the IRC message
   parser for each received message (line), but not for the last message
   (which will lack the message delimiter, because it is incomplete).
   The preprocessor does not call the message parser with the incomplete
   line, instead it resets the incomplete flag from the end-delimiter
   test and temporarily stores the incomplete line until arrival of the
   next TCP segment.


3. Expiration Notice

   This document expires in Jan 2002.


4. Credits and Authors' Address

   Christian Burri
   jun. System & Network Engineer
   Synecta Informatik AG
   Zwinglistrasse 3
   9000 St. Gallen
   SWITZERLAND

   Email: christian.burri@synecta.ch

   Special credits to Jarkko Oikarinen and all other contributors for
   creating such a cool thing as IRC :)


























Burri                                                            [page 5]