Internet DRAFT - draft-faerber-i18n-email-netnews-names
draft-faerber-i18n-email-netnews-names
INTERNET-DRAFT Claus Faerber
draft-faerber-i18n-email-netnews-names-00 August 2002
Internationalisation of Email Addresses,
Newsgroup Names and similar Identifiers
Status of this memo
This document is an Internet-Draft and is subject to all provisions
of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Abstract
This document describes a possible architecture for the
implementation of internationalised email addresses, newsgroup names,
and similar identifiers on top of the standards set by the
Internationalised Domain Names [IDN] working group.
1 Introduction
1.1 Overview
The advent of internationalised domain names raises the question how
other identifiers, such as email addresses, newsgroup names, etc.
should be internationalised.
As these types of identifiers are often included in other types of
identifiers, an overall architecture is needed.
This draft proposes a solution derieved directly from the
internationalization of domain names and several requirements
described in section 1.2.
1.2 Requirements
The author of this draft believes that a specification must meet the
following requirements:
- Leagacy mail and news user agents, MTAs (including injection
agents) and news servers must be able to handle the
Faerber Expires: March 2003 Page 1
INTERNET-DRAFT Int. Email Addresses and Newsgroup Names August 2002
internationalised addresses without problems.
- Therefore, the encoding of domain names should be identical to that
of internationalised domain names [IDN].
- Further, the encoding of domain names included within the LHS of
email addresses should be identical to that of internationalised
domain names.
- As delimiters are often exchanged, the result should be identical
regardless of the order in which the exchange of the delimiters and
the encoding of the internationalised domain names occurs.
- A single encoding/decoding function should be able to handle both
internationalised domain names and ohter internationalised
identifiers.
2 Encoding of Internationalised Names
The requirements set forth in section 1.2 lead directly to the
following architecture:
- Names are split in individual parts at the following delimiters:
SP / %x00-1F / "." / "@" / "+" / "%" / "=" / "/" / "," / ";" / ":"
/ "!" / "(" / ")" / "[" / "]" / "<" / ">"
[[RATIONALE: As much delimiters as possible are used to increase
the chance that the encoding of individual parts of the identifier
are encoded the same way when included in other identifiers:
"@" - used to seperate local-part and domain name.
"+" - used by some mailers for subaddressing
"%" - used by some MTAs to embed domains within the local-part of
email addresses ("percent-hack")
"=" - used within MIXER (RFC 2156)
"/" - used wihtin MIXER (RFC 2156), used as a newsgroup component
seperator in some leagacy non-RFC BBS networks.
",", ";" - used to seperate identifiers in many positions
":" - used to seperate (obsolete) source routes from the
destination address
" " - used to seperate source routes from each other.
"!" - used as a seperator within the Path header in RFC 1036,
used as a address seperator within (obsolete) UUCP bang
addresses
"(", ")" - used for comments, used within the replacement for some
seperators according to MIXER (e.g. "(a)" instead of "@")
"[", "]", "<", ">" - as precaution ]]
- Each part is then prepared according to [NAMEPREP] and encoded
according using [PUNYCODE]. The Mixed-case annotation described in
appendix B of [PUNYCODE] is used.
- The parts are then re-assembled to build the encoded name.
[[NOTE: As it only adds characters that are not allowed in domain
names as delimiters, it will procude the same results (except for the
case of the resulting string, which does not matter within domain
names) as [IDN] for all valid domain names.]]
Faerber Expires: March 2003 Page 2
INTERNET-DRAFT Int. Email Addresses and Newsgroup Names August 2002
3 Usage Within Applications
3.1 General
The format of identifiers defined by various specifications is not
altered in any way; all data sent over the network uses the encoded
form of the identifiers.
Only display and input of these identifiers is changed in the user
agent (i.e. the software that interfaces directly with human users).
It is the task of the user agent to encode all non-ASCII characters
in identifiers using the method described in section 2.
Changes to relay agents, transport agents, etc., and software
accompanying them are usually not necessary.
3.2 Email
3.2.1 RFC 2821
Internationalised identifiers can appear within the following
lexicals:
- Domain of the EHLO and HELO commands
- return-path of the MAIL FROM command
- forward-path of the RCPT TO command
- String of the VRFY and EXPN commands
Example:
C: EHLO zq--frber-gra.muc.de
C: MAIL FROM:<zq--cfrber-cua@zq--frber-gra.muc.de>
SMTP agents do not need to implement this specification to handle
internationalised identifiers correctly. SMTP agents MUST handle
addresses that appear to be malformed internationalised identifiers.
The VRFY and EXPN commands may profit from future extensions to
handle unencoded names.
[[NOTE: Although outside the scope of this specification, it is
believed that the interface between MUAs and MTAs will use the
encoded form of these identifiers, too, so that the MTA can be kept
completly unchanged.
Local delivery agents might profit from extensions to allow pattern
matching agains internationalised identifiers.]]
3.2.2 RFC 2822
Internationalised identifiers can appear within the following
lexicals:
- addr-spec
- obs-route
- domain
Faerber Expires: March 2003 Page 3
INTERNET-DRAFT Int. Email Addresses and Newsgroup Names August 2002
Example:
From: =?ISO-8859-1?Q?Claus_F=E4rber?= <zq--cfrber-cua@zq--frber-gra.muc.de>
Mail user agents that do not implement this specification will
present the identifiers in encoded form to the user. Users will still
be able to reply to messages using these identifiers.
3.3 Netnews/Usenet
3.3.1 RFC 1036
Internationalised identifiers can appear within the following header
fields:
- parts of From, Sender, and Reply-To header fields that correspond
to those described in RFC 2822.
- Path header
- Newsgroup and Followup-To header
as well as within the following lexicals:
- groupname argument to newgroup and rmgroup commands.
- newsgroup names within checkgroup messages.
Examples:
Newsgroups: se.test.zq--rksmrgs-5wao1o
Control: newsgroup se.test.zq--rksmrgs-5wao1o
News user agents that do not implement this specification will
present the identifiers in encoded form to the user. Users will still
be able to read newsgroups, send followups and replies to messages
using these identifiers.
News transfer agents do not need to implement this specification to
handle internationalised identifiers correctly.
3.3.2 RFC 977/RFC 2980
Internationalised identifiers can appear within all groupnames passed
as arguments to NNTP commands or returned by these commands.
NNTP servers do not need to implement this specification to handle
internationalised identifiers correctly.
Extended NNTP commands taking "wildmat" as an argument may profit
from an implementation that takes into accout that group names might
be encoded according to this specification and matches agains the
decoded form of these names.
3.3.3 Submission to moderated newsgroups
When submitting articles POSTed to moderated group to the moderator,
the moderator's email address is often determined using a method
where a pattern in an "wildcard" email address is replaced by the
Faerber Expires: March 2003 Page 4
INTERNET-DRAFT Int. Email Addresses and Newsgroup Names August 2002
name of the moderated newsgroup, having all "."s within the newsgroup
name replaced by "-".
This will result in email addresses not formed according to this
specification.
Example:
A message sent to the moderated newsgroup
se.test.zq--rksmrgs-5wao1o.moderated will be forwarded to the email
address se-test-zq--rksmrgs-5wao1o@usenet-se.net, although the
expected encoding for the email address would be
zq--se-test-rksmrgs-8kbw71a@usenet-se.net
Administrators of sites providing such address aliases MUST set up
aliases for both forms of the email address.
[[NOTE: This only affects a small number of sites: those providing
mail aliases for newsgroup moderators.
We can't add "-" to the list of part seperators as this would be
incompatible with [IDN]. [IDN] can't be changed as there is no other
non-alphanumeric character allowed in domain names.]]
4 Relation to other specifications
4.1 IDN
This specification extends the system of Internationalised domain
names described in [IDN].
4.2 USEFOR
This specification provides an alternative to the use of unencoded
domain names as proposed by the USEFOR working group [USEFOR], which
is believed to cause severe interoperability problems.
This specification avoids such problems by using an encoding that
produces encoded forms of newsgroup names that are fully compliant
with RFC 1036.
4 References
[IDN] Faltstrom, Faltstrom, et. al., "Internationalizing Domain Names
in Applications (IDNA)", draft-ietf-idn-idna-10.
[PUNYCODE] Adam Costello, "Punycode: An encoding of Unicode for use with
IDNA", draft-ietf-idn-punycode.
[NAMEPREP] Paul Hoffman and Marc Blanchet, "Nameprep: A Stringprep
Profile for Internationalised Domain Names",
draft-ietf-idn-nameprep.
[USEFOR] Charles H. Lindsey, "News Article Format",
draft-ietf-usefor-article
Faerber Expires: March 2003 Page 5
INTERNET-DRAFT Int. Email Addresses and Newsgroup Names August 2002
5 Author's Address
Claus Faerber
Connollystrasse 8
80809 Muenchen
GERMANY
E-Mail: claus@faerber.muc.de
NOTE: Please write the author's last name with a-umlaut (Unicode
U+00E4, HTML ä) instead of "ae" where possible: Färber
Full Copyright Statement
Copyright (C) The Internet Society (2002). All Rights Reserved
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Faerber Expires: March 2003 Page 6