Internet DRAFT - draft-chung-idnop-charprep

draft-chung-idnop-charprep





                                                            Edmon Chung 
Internet Draft                                                   Neteka 
<draft-chung-idnop-charprep-00.txt>           
Intended Category: Informational                             April 2003 
 
 
         CHARPREP û Character Equivalency Preparations for IDN 
 
 
STATUS OF THIS MEMO 
 
   This document is an Internet-Draft and is in full conformance with 
   all provisions of Section 10 of RFC2026.  
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups.  Note that 
   other groups may also distribute working documents as Internet-
   Drafts.  Internet-Drafts are draft documents valid for a maximum of 
   six months and may be updated, replaced, or obsoleted by other 
   documents at any time.  It is inappropriate to use Internet-Drafts as 
   reference material or to cite them other than as "work in progress."  
    
   The reader is cautioned not to depend on the values that appear in 
   examples to be current or complete, since their purpose is primarily 
   educational.  Distribution of this memo is unlimited. 
    
   The list of current Internet-Drafts can be accessed at  
   http://www.ietf.org/ietf/1id-abstracts.txt 
   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html. 
    
    
Abstract 
    
   Charprep intends to take up where Nameprep [NAMEPREP] left off to 
   provide additional preventive measures to bridge the users conceptual 
   perception of a multilingual domain name with the domain matching 
   process.  The critical development from Nameprep is that common user 
   perception is taken into account.  That is, Charprep strives to take 
   the 'case-insensitivity' concept of user-friendliness to another 
   level for IDNs because of the inherent complexity and potential 
   confusion that could arise from the use of multilingual characters in 
   domain names. 
    
   Charprep is designed to be a framework for Zone Administrators (e.g. 
   domain registries) to employ relevant equivalency tables to compute 
   and generate variants from the original string to variants that could 
   possibly create confusion with users.  The actual management of 
   Reserved Variants (RV), Zone Variants (ZV) with the original string 
   (Primary Domain) will be discussed in Zoneprep [ZONEPREP]. 
    
   Furthermore, Charprep and Zoneprep are designed to be a recommended 
   feature to be offered to users by a Zone Administrator (e.g. Domain 
  
Chung                                                          [Page 1] 
IDNOP-CHARPREP                                               April 2003 
 
 
   Registries) in the management of Internationalized domain names 
   (IDN).  A key concept is that these are done without affecting the 
   IDN protocol specified in [RFC3490], [RFC3491] and [RFC3492]. 
    
    
Table of Contents 
    
   1. Introduction....................................................2 
   1.1 Terminology....................................................3 
   1.2 Nomenclature...................................................3 
   1.3 Disclaimer.....................................................3 
   2. Importance of Charprep..........................................3 
   3. Equivalency versus Prohibition..................................4 
   4. Character Equivalency Preparations..............................4 
   5. Charprep Tables and Profiles....................................5 
   5.1 Codepoints Inclusion Table.....................................6 
   6.2 Charprep Table.................................................6 
   6.3 Publishing of Charprep Profiles................................7 
   6.4 Generation of Charprep Equivalence Set.........................7 
   7. IANA Considerations.............................................8 
   8. Security Considerations.........................................8 
   Acknowledgements...................................................8 
    
    
1. Introduction 
    
   During the discussions to establish an IDN protocol, a great number 
   of problematic issues surrounding name equivalency were uncovered.  
   The current Nameprep document decided to constrain its scope of 
   appliance: 
    
   "Although it would be easy to use the process in this step to 
   "correct" perceived mis-features or bugs in the current character 
   standards, [Nameprep] expressly does not do so." 
    
   Charprep will continue to uphold the spirit of Nameprep to, "allow as 
   wide of a range of characters as possible to be allowed in host 
   names... The user should not be limited to only entering exactly the 
   characters that might have been used, but to instead be able to enter 
   characters that unambiguously [represents] the characters in the 
   [perceived] host name." 
    
   In other words, to be able to use different but perceptually 
   equivalent characters (codepoints) and still arrive at the perceived 
   domain. 
    
   This document does not include the specific character equivalency 
   preparation (Charprep) tables, nor does it provide explicit policies 
   for the use of the Charprep tables.  Rather, it intends to briefly 
   describe the problem of character equivalency issues for IDNs as well 
   as to suggest a framework for the publishing of Charprep tables for 
   different languages. 
    
  
Chung                                                          [Page 2] 
IDNOP-CHARPREP                                               April 2003 
 
 
1.1 Terminology 
    
   The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", 
   and "MAY" in this document are to be interpreted as described in RFC 
   2119 [RFC2119]. 
    
1.2 Nomenclature 
    
   As in the Unicode Standard [UNICODE], Unicode code points are denoted 
   by "U+" followed by four to six hexadecimal digits. 
    
   The following terms will carry specific definitions within this 
   document: 
    
   Zone Administrator û A domain operator or service that manages sub-
   domain delegations. This would include domain registries such as TLD 
   registries as well as domain operators of SLDs to issue third level 
   domains, etc. 
    
   Registration û Entry of a domain into the zone file of an 
   authoritative name server. 
    
   Resolution û Matching or lookup of domain names within the name 
   server. 
    
   IDN û Internationalized Domain Names: domain names consisted of one 
   or more characters out of the A-z 0-9 and "-" repertoire. 
    
1.3 Disclaimer 
    
   This document does NOT intend to provide any discussion on 
   equivalence policies of any scripts, nor does it intend to suggest 
   any type of policies.  Zone Administrators SHOULD consult with and 
   understand the needs of their user base before deciding and 
   publishing their own policies.  Examples provided in this document 
   are for explanation only. 
    
2. Importance of Charprep 
    
   The best way to illustrate the importance and need for Charprep is 
   through the following simple example: 
    
   Suppose a person obtained a domain <alpha><beta>.example from the 
   .example Zone Manager.  The person now advertises his domain as 
   <ALPHA><BETA>.example (Alpah & Beta in capital letters).  A user 
   seeing this perceives the domain as AB.example.  The user now 
   attempts to access the domain and fails. 
    
   It is true that the characters <ALPHA> and <A> are not technically 
   equivalent, but because of their perceived equivalence, it will cause 
   confusion to the user and therefore defeating the purpose of having a 
   human-friendly domain name system. 
    
  
Chung                                                          [Page 3] 
IDNOP-CHARPREP                                               April 2003 
 
 
   More importantly, it could create a security issue whereby a domain 
   name is maliciously registered to confuse the end user.  For example, 
   suppose the AB.example site is an e-Commerce site, a malevolent 
   registrant may register the domain <ALPHA><BETA>.example set up a 
   link to it on a competing site.  The end user will not be able to 
   realize that s/he is being brought to a different site because the 
   display will always look like: ôAB.exampleö. 
    
   Charprep will provide a framework for the publishing of Charprep 
   tables that can be used by Zone Administrators to create a set of 
   variants from the original submitted domain (Primary Domain) that may 
   cause user confusion.  Further management of this set of variants 
   with regards to zone file entries is discussed in Zoneprep. 
    
3. Equivalency versus Prohibition 
    
   A common misconception is that equivalence preparations prohibit the 
   use of mapped characters.  This is NOT true.  For example, even if 
   <ALPHA> is deemed equivalent to <A>, and vice versa, it does not 
   prohibit a Zone Administrator to offer a domain name that contains 
   <ALHPA>, or <A>, or both.  To resolve possible conflicts, the first 
   come first serve rule as employed by most zone administrators today 
   may naturally come into place. 
    
   Another common misconception is that character equivalence 
   consideration requires word or phrase semantic (orthographic) 
   equivalence.  This is also NOT true.  Charprep does not give much 
   regard to the end phrase or word, but focus on the character itself.  
   Therefore, even though a character may be semantically different, it 
   MAY still be considered as equivalent (e.g. <ALPHA> versus <A>).  Or 
   in the inverse, even though a character may be visually different, it 
   MAY still be considered equivalent (as in the case for Traditional 
   versus Simplified Chinese characters). 
    
4. Character Equivalency Preparations 
    
   Throughout the IDN discussions, character equivalency issues were 
   repeatedly brought up.  While it is appropriately dismissed as a core 
   protocol concern, the importance of Charprep has never been 
   discounted.  Especially from zone operators who have started to 
   deploy IDNs as well as from a policy point of view such as in the 
   discussions at ICANN. 
    
   Charprep is important because characters that may be perceptually 
   equivalent, whether visually or contextually, may occupy different 
   "codepoints" (as specified in Unicode), and therefore make them 
   "technically" distinct and unique "characters", yet in real-life they 
   are perceived and considered to be the same. 
    
   For example, the Greek capital letter <ALPHA> is visually identical 
   to the English capital letter <A>, yet they occupy two different 
   codepoints in the Unicode scheme.  The implication is that 
   <ALPHA>.example and <A>.example are technically two distinct domain 
  
Chung                                                          [Page 4] 
IDNOP-CHARPREP                                               April 2003 
 
 
   names even though, when displayed may appear identical: "A.example", 
   and "A.example".  Furthermore, the Cyrillic capital letter "A" is 
   also visually identical to the <ALPHA> and <A>. 
    
   For another example, within the Chinese language, one particular 
   character may have a number of different visual representations, yet 
   they are conceptually equivalent.  The most noticeable case is the 
   Traditional Chinese versus the Simplified Chinese representation of a 
   character (e.g. . [U+767C("fa"-prosper)] and . [U+53B1("fa"-prosper 
   | hair)]).  To complicate matters these relationships may not be one-
   to-one, because within different context, a character may take on a 
   semantically different meaning, therefore creating additional 
   variances from the root character (e.g. . [U+53B1("fa"-prosper | 
   hair)] and . [U+9AEE("fa"-hair)] ). 
    
   Furthermore, parts of the Japanese and Korean languages utilizes a 
   subset of the Chinese character repertoire.  Two characters that may 
   be considered perceptually equivalent in the context of the Chinese 
   language, however, may be considered distinct and unique in Japanese 
   Kanji (e.g. . [U+570B("guo"-country<cn>)("goku"-a name<jp>)] and . 
   [U+56FD("guo"-country<cn>)("koku"-country<jp>)] ). 
    
   It is therefore very important to preserve the perceptual 
   expectations of the end user for multilingual domain names, to 
   maintain the user-friendly spirit of domain names in order to allow 
   it to continue to be a useful and human-friendly means of direct 
   navigation and resource addressing over the Internet. 
    
5. Charprep Tables and Profiles 
    
   Charprep deals with perceptual equivalency of characters.  Characters 
   are units of visual or graphical representation of the written form 
   of languages.  Scripts best define the collection of a set of 
   characters.  Charprep profiles MAY utilize the ISO15924: Codes for 
   the representation of names for scripts, as the guide for identifying 
   scripts and managing Charprep tables.  Multiple scripts may share one 
   Charprep profile and vice versa.  Charprep profiles MAY also define 
   their own Codepoint Inclusion table. 
    
   Each Charprep Profile SHOULD consist the following three elements: 
    
   1. Charprep Report  
   2. Codepoints Inclusion  
   3. Charprep Table 
    
   The Charprep report should provide description to the policy as well 
   as some rationale and reasoning for equivalency determination of the 
   policy. 
    
   If the Charprep report simply identifies the set of one or more 
   script codes [ISO15924], a Codepoints Inclusion table is not 
   necessary.  If a more delicate approach is desired, a Codepoints 
   Inclusion Table SHOULD be included.  A Codepoints Inclusion Table 
  
Chung                                                          [Page 5] 
IDNOP-CHARPREP                                               April 2003 
 
 
   simply provides a set of codepoints that is intended for the 
   corresponding Charprep Table. 
    
   {Note: Current documents of reference include [TSCONV], [JPCHAR] and 
   [HANGULCHAR], along with [IDN-ADMIN]} 
    
5.1 Codepoints Inclusion Table 
    
   The Codepoints Inclusion Table should simply be a list of codepoints 
   that are intended to be included within the Charprep profile: 
    
     #Codepoints Inclusion Table for XXX 
     #version x.x 
     #script: XXX YYY 
    
     U+XXXX; Optional Remarks 
     U+XXXX; Optional Remarks 
     U+XXXX; Optional Remarks 
     ... 
    
   Note that a codepoints inclusion table name and a version number MUST 
   be included as part of the header of the table.  Optionally, scripts 
   considered within the table could be included.  If multiple scripts 
   are used a space separated list of the script code [ISO15924] should 
   be provided. 
    
6.2 Charprep Table 
    
   The Charprep Table MUST have 3 columns and each entry MUST be filled 
   for the first 2 columns with the third as an optional: 
    
      Codept       Equivalent Set                   Remarks 
    +--------+-------------------------+------------------------------+ 
    | U+XXXX | U+XXXX U+XXXX U+XXXX ...| Optional Remarks             | 
    :        :                                                        : 
    
   There should be one entry for each Nameprep-ed codepoint considered 
   in the Charprep table.  The Equivalent Set column consists of a set 
   of one or more space delimited codepoints corresponding to the 
   codepoint in the first column.  For multi-codepoint entries, the 
   convention: U+XXXX+XXXX is used.  Optional Remarks may be provided 
   for each entry.  For example: 
    
      Codept      Charprep Variants                Remarks 
    +--------+-------------------------+------------------------------+ 
    | U+0061 | U+03B1 U+0430           | Greek & Cyrillic <A>         | 
    +--------+-------------------------+------------------------------+ 
    | U+03B1 | U+0061 U+0430           | English & Cyrillic <A>       | 
    +--------+-------------------------+------------------------------+ 
    | U+0430 | U+0061 U+03B1           | English & Greek <A>          | 
    +--------+-------------------------+------------------------------+ 
    :        :                                                        : 
    
  
Chung                                                          [Page 6] 
IDNOP-CHARPREP                                               April 2003 
 
 
   Note that the number of entries for the Variant Table might NOT be 
   the same as the Codepoints Inclusion Table for the same Charprep 
   profile. 
    
   Note also that a Charprep Table MAY not be necessary if the policy of 
   the Charprep profile is simply to have a Codepoint Inclusion Table. 
    
6.3 Publishing of Charprep Profiles 
    
   A Zone Administrator, especially Top-Level Domain Registries, SHOULD 
   publish Charprep profiles for all scripts (languages) they allow 
   registrations in, and make it publicly available for end users to 
   understand the registration policies. 
    
   The Codepoints Inclusion Tables and Charprep Tables SHOULD exist in 
   flat file format with the semi-colon used as a column delimiter.  For 
   example: 
    
     #Charprep Table for XXX 
     #version x.x 
     #script: XXX YYY 
    
     U+0061; U+03B1 U+0430; Greek & Cyrillic <A> 
     U+03B1; U+0061 U+0430; English & Cyrillic <A> 
     U+0430; U+0061 U+03B1; English & Greek <A>  
    
6.4 Generation of Charprep Equivalence Set 
    
   Charprep does not discuss about the specific policies of managing DNS 
   zone files and how the generated variants are managed thereof.  The 
   Charprep tables and profiles enable Zone Administrators to create a 
   set of variants from a given IDN. 
    
   For example, based on the examples above, the domain: 
    
        <03B1><03B1>.example  [<Alpha><Alpha>.example] 
    
   Would generate a set of 8 Charprep Variants: 
    
        <03B1><0061>.example 
        <03B1><0430>.example 
        <0061><0061>.example 
        <0061><03B1>.example 
        <0061><0430>.example 
        <0430><0061>.example 
        <0430><03B1>.example 
        <0430><0430>.example 
    
   The management of the variants and how they should be represented and 
   managed in the DNS zone file will be further discussed in Zoneprep 
   [ZONEPREP].  Zoneprep describes a framework for Zone Administrator to 
   prepare their zone files based on Zoneprep profiles. 
    
  
Chung                                                          [Page 7] 
IDNOP-CHARPREP                                               April 2003 
 
 
7. IANA Considerations 
    
   There are no explicit IANA considerations required for Charprep.  
   IANA may however decide to maintain a registry for Charprep Profiles 
   as described in Section 6. 
    
8. Security Considerations 
    
   This document does not talk about DNS security issues, and it is 
   believed that the proposal does not introduce additional security 
   problems not already existent and/or anticipated by adding 
   multilingual characters to DNS and/or using ACE. 
    
   Charprep considerations could however help to improve the security 
   and authenticity for the usage of IDNs by reducing the confusion of 
   perceptually equivalent characters. 
    
Acknowledgements 
    
   This document incorporates many of the discussions from the CJK 
   community (from CNNIC, TWNIC, JPRS and KRNIC respectively) and by the 
   JET (Joint Engineering Team) as well as at different forums including 
   IETF and ICANN.  More importantly discussions in the document: 
   "Internationalized Domain Names Registration and Administration 
   Guideline for Chinese, Japanese and Korean". 
    
   Furthermore, many valuable comments and discussions with the 
   following people were incorporated: 
    
   Xiaodong (Sheldon) Lee 
   Kenny Huang 
   Paul Hoffman 
   Mark Davis 
   Vincent Chen 
    
References 
 
   [TSCONV]   XiaoDong LEE, et al., ôTraditional and Simplified Chinese 
              Conversionö, November 2001 
    
   [JPCHAR]   Yoshiro Yoneya & Yasuhiro Morishita, JPNIC, ôJapanese 
              characters in multilingual domain name labelsö, March 2, 
              2001 
     
   [HANGULCHAR] Soobok Lee & GyeongSeog Gim, ôHangeul NAMEPREP 
              recommendation version 1.0ö, June 2001 
    
   [RFC1034]  Mockapetris, P., "Domain Names - Concepts and  
              Facilities," STD 13, RFC 1034, USC/ISI, November 1987 
       
   [RFC1035]  Mockapetris, P., "Domain Names - Implementation and  
              Specification," STD 13, RFC 1035, USC/ISI, November  
              1987 
  
Chung                                                          [Page 8] 
IDNOP-CHARPREP                                               April 2003 
 
 
    
   [RFC2119]  S. Bradner, "Key words for use in RFCs to Indicate  
              Requirement Levels," RFC 2119, March 1997 
    
   [RFC2181]  R. Elz, University of Melbourne & R. Bush, RGnet, Inc., 
              ôClarifications to the DNS Specificationö, July 1997 
    
   [RFC3454]  P. Hoffman, IMC & VPNC & M. Blanchet, Viagenie, 
              öPreparation of Internationalized Strings ("stringprep")ö, 
              December 2002 
    
   [RFC3490]  P. Faltstrom, Cisco, P. Hoffman, IMC & VPNC & A. Costello 
              UC Berkeley, ôInternationalizing Domain Names in 
              Applications (IDNA)ö, March 2003 
    
   [RFC3491]  P. Hoffman, IMC & VPNC & M. Blanchet, Viagenie, ôNameprep: 
              A Stringprep Profile for Internationalized Domain Names 
              (IDN)ö, March 2003 
    
   [RFC3492]  A. Costello, Univ. of California, Berkeley, ôPunycode: A 
              Bootstring encoding of Unicode for Internationalized 
              Domain Names in Applications (IDNA)ö, March 2003 
    
   [IDN-Admin] Editors: James SENG & John KLENSIN; Authors: K. KONISHI, 
              K. HUANG, H. QIAN & Y. KO, ôInternationalized Domain Names 
              Registration and Administration Guideline for Chinese, 
              Japanese and Koreanö 
    
   Authors: 
    
   Edmon Chung 
   Neteka 
   Suite 100, 
   243 College St., Toronto, 
   Ontario, Canada M5T 1R5 
   edmon@neteka.com 

















  
Chung                                                          [Page 9]