Internet DRAFT - draft-faerber-i18n-email-netnews-names

draft-faerber-i18n-email-netnews-names



INTERNET-DRAFT                                            Claus Faerber
draft-faerber-i18n-email-netnews-names-00                   August 2002

               Internationalisation of Email Addresses, 
               Newsgroup Names and similar Identifiers

Status of this memo

   This document is an Internet-Draft and is subject to all provisions
   of Section 10 of RFC2026.
 
   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

Abstract

   This document describes a possible architecture for the
   implementation of internationalised email addresses, newsgroup names,
   and similar identifiers on top of the standards set by the
   Internationalised Domain Names [IDN] working group.

1 Introduction

1.1 Overview

   The advent of internationalised domain names raises the question how
   other identifiers, such as email addresses, newsgroup names, etc.
   should be internationalised.
  
   As these types of identifiers are often included in other types of
   identifiers, an overall architecture is needed.

   This draft proposes a solution derieved directly from the
   internationalization of domain names and several requirements
   described in section 1.2.

1.2 Requirements

   The author of this draft believes that a specification must meet the
   following requirements:

   - Leagacy mail and news user agents, MTAs (including injection
     agents) and news servers must be able to handle the

Faerber                  Expires: March 2003                     Page 1

INTERNET-DRAFT   Int. Email Addresses and Newsgroup Names    August 2002

     internationalised addresses without problems.
   - Therefore, the encoding of domain names should be identical to that
     of internationalised domain names [IDN].
   - Further, the encoding of domain names included within the LHS of
     email addresses should be identical to that of internationalised
     domain names.
   - As delimiters are often exchanged, the result should be identical
     regardless of the order in which the exchange of the delimiters and
     the encoding of the internationalised domain names occurs.
   - A single encoding/decoding function should be able to handle both
     internationalised domain names and ohter internationalised
     identifiers.

2 Encoding of Internationalised Names

   The requirements set forth in section 1.2 lead directly to the
   following architecture:
  
   - Names are split in individual parts at the following delimiters:
    
     SP / %x00-1F / "." / "@" / "+" / "%" / "=" / "/" / "," / ";" / ":"
     / "!" / "(" / ")" / "[" / "]" / "<" / ">"

     [[RATIONALE: As much delimiters as possible are used to increase
     the chance that the encoding of individual parts of the identifier
     are encoded the same way when included in other identifiers:
     "@" - used to seperate local-part and domain name.
     "+" - used by some mailers for subaddressing
     "%" - used by some MTAs to embed domains within the local-part of
           email addresses ("percent-hack")
     "=" - used within MIXER (RFC 2156)
     "/" - used wihtin MIXER (RFC 2156), used as a newsgroup component
           seperator in some leagacy non-RFC BBS networks.
     ",", ";" - used to seperate identifiers in many positions
     ":" - used to seperate (obsolete) source routes from the
           destination address
     " " - used to seperate source routes from each other.
     "!" - used as a seperator within the Path header in RFC 1036,
           used as a address seperator within (obsolete) UUCP bang
           addresses
     "(", ")" - used for comments, used within the replacement for some 
           seperators according to MIXER (e.g. "(a)" instead of "@")
     "[", "]", "<", ">" - as precaution				     ]]

   - Each part is then prepared according to [NAMEPREP] and encoded
     according using [PUNYCODE]. The Mixed-case annotation described in
     appendix B of [PUNYCODE] is used.

   - The parts are then re-assembled to build the encoded name.

   [[NOTE: As it only adds characters that are not allowed in domain
   names as delimiters, it will procude the same results (except for the
   case of the resulting string, which does not matter within domain
   names) as [IDN] for all valid domain names.]]

Faerber                  Expires: March 2003                     Page 2

INTERNET-DRAFT   Int. Email Addresses and Newsgroup Names    August 2002

3 Usage Within Applications
  
3.1 General

   The format of identifiers defined by various specifications is not
   altered in any way; all data sent over the network uses the encoded
   form of the identifiers.

   Only display and input of these identifiers is changed in the user
   agent (i.e. the software that interfaces directly with human users).
   It is the task of the user agent to encode all non-ASCII characters
   in identifiers using the method described in section 2.
   
   Changes to relay agents, transport agents, etc., and software
   accompanying them are usually not necessary.

3.2 Email

3.2.1 RFC 2821

   Internationalised identifiers can appear within the following
   lexicals:

   - Domain of the EHLO and HELO commands
   - return-path of the MAIL FROM command
   - forward-path of the RCPT TO command
   - String of the VRFY and EXPN commands

   Example:
     C: EHLO zq--frber-gra.muc.de
     C: MAIL FROM:<zq--cfrber-cua@zq--frber-gra.muc.de>

   SMTP agents do not need to implement this specification to handle
   internationalised identifiers correctly. SMTP agents MUST handle
   addresses that appear to be malformed internationalised identifiers.

   The VRFY and EXPN commands may profit from future extensions to
   handle unencoded names.

   [[NOTE: Although outside the scope of this specification, it is
   believed that the interface between MUAs and MTAs will use the
   encoded form of these identifiers, too, so that the MTA can be kept
   completly unchanged.
   Local delivery agents might profit from extensions to allow pattern
   matching agains internationalised identifiers.]]

3.2.2 RFC 2822

   Internationalised identifiers can appear within the following
   lexicals:
  
   - addr-spec
   - obs-route
   - domain

Faerber                  Expires: March 2003                     Page 3

INTERNET-DRAFT   Int. Email Addresses and Newsgroup Names    August 2002

   Example:
     From: =?ISO-8859-1?Q?Claus_F=E4rber?= <zq--cfrber-cua@zq--frber-gra.muc.de>
  
   Mail user agents that do not implement this specification will
   present the identifiers in encoded form to the user. Users will still
   be able to reply to messages using these identifiers.

3.3 Netnews/Usenet

3.3.1 RFC 1036

   Internationalised identifiers can appear within the following header
   fields:

   - parts of From, Sender, and Reply-To header fields that correspond
     to those described in RFC 2822.
   - Path header
   - Newsgroup and Followup-To header

   as well as within the following lexicals:

   - groupname argument to newgroup and rmgroup commands.
   - newsgroup names within checkgroup messages.

   Examples:
     Newsgroups: se.test.zq--rksmrgs-5wao1o
     Control: newsgroup se.test.zq--rksmrgs-5wao1o

   News user agents that do not implement this specification will
   present the identifiers in encoded form to the user. Users will still
   be able to read newsgroups, send followups and replies to messages
   using these identifiers.

   News transfer agents do not need to implement this specification to
   handle internationalised identifiers correctly.

3.3.2 RFC 977/RFC 2980

   Internationalised identifiers can appear within all groupnames passed
   as arguments to NNTP commands or returned by these commands.

   NNTP servers do not need to implement this specification to handle
   internationalised identifiers correctly.
  
   Extended NNTP commands taking "wildmat" as an argument may profit
   from an implementation that takes into accout that group names might
   be encoded according to this specification and matches agains the
   decoded form of these names.

3.3.3 Submission to moderated newsgroups

   When submitting articles POSTed to moderated group to the moderator,
   the moderator's email address is often determined using a method
   where a pattern in an "wildcard" email address is replaced by the

Faerber                  Expires: March 2003                     Page 4

INTERNET-DRAFT   Int. Email Addresses and Newsgroup Names    August 2002

   name of the moderated newsgroup, having all "."s within the newsgroup
   name replaced by "-".
   This will result in email addresses not formed according to this
   specification.

   Example:
     A message sent to the moderated newsgroup
     se.test.zq--rksmrgs-5wao1o.moderated will be forwarded to the email
     address se-test-zq--rksmrgs-5wao1o@usenet-se.net, although the
     expected encoding for the email address would be
     zq--se-test-rksmrgs-8kbw71a@usenet-se.net

   Administrators of sites providing such address aliases MUST set up
   aliases for both forms of the email address.

   [[NOTE: This only affects a small number of sites: those providing
   mail aliases for newsgroup moderators.
   We can't add "-" to the list of part seperators as this would be
   incompatible with [IDN]. [IDN] can't be changed as there is no other
   non-alphanumeric character allowed in domain names.]]
   
4 Relation to other specifications

4.1 IDN

   This specification extends the system of Internationalised domain
   names described in [IDN].

4.2 USEFOR

   This specification provides an alternative to the use of unencoded
   domain names as proposed by the USEFOR working group [USEFOR], which
   is believed to cause severe interoperability problems.
   
   This specification avoids such problems by using an encoding that
   produces encoded forms of newsgroup names that are fully compliant
   with RFC 1036.

4 References 

[IDN]   Faltstrom, Faltstrom, et. al., "Internationalizing Domain Names
        in Applications (IDNA)", draft-ietf-idn-idna-10.

[PUNYCODE] Adam Costello, "Punycode: An encoding of Unicode for use with
        IDNA", draft-ietf-idn-punycode.

[NAMEPREP] Paul Hoffman and Marc Blanchet, "Nameprep: A Stringprep
        Profile for Internationalised Domain Names",
        draft-ietf-idn-nameprep.

[USEFOR] Charles H. Lindsey, "News Article Format",
        draft-ietf-usefor-article



Faerber                  Expires: March 2003                     Page 5

INTERNET-DRAFT   Int. Email Addresses and Newsgroup Names    August 2002

5 Author's Address

   Claus Faerber
   Connollystrasse 8
   80809 Muenchen
   GERMANY

   E-Mail: claus@faerber.muc.de

   NOTE: Please write the author's last name with a-umlaut (Unicode
   U+00E4, HTML &auml;) instead of "ae" where possible: F&auml;rber

Full Copyright Statement

   Copyright (C) The Internet Society (2002). All Rights Reserved

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the  purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
















Faerber                  Expires: March 2003                     Page 6