Internet DRAFT - draft-felton-universal-language

draft-felton-universal-language



Expires 10/22/2001

International Language Bridge (ILB) For                 Mark Felton
Implementing Language Free Services

draft-felton-universal-language-01.txt




1.1 Status of this Memo

This document is an Internet-Draft and is in full conformance with all 
provisions  of  Section 10 of RFC2026

Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups.  Note that
other groups may also distribute working documents as
Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time.  It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
"work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.txt

The list of Internet-Draft Shadow Directories can
 be accessed at http://www.ietf.org/shadow.html.




This document specifies an Internet standards track protocol 
for the Internet community, and requests discussion and 
suggestions for improvements.  Please refer to the current 
edition of the "Internet Official Protocol Standards" 
(STD 1) for the standardization state and status
of this protocol.  Distribution of this memo is unlimited.

1.2 Copyright Notice
Temporary
Copyright (C) Mark Felton (2001). All Rights Reserved
// Future Copyright (C) The Internet Society (2001). 
All Rights Reserved.


 Mark Felton
 markf@scicom.alphacdc.com



1.3 Table Of Contents	
	Abstract
	Terminology
	UL Components
	Unicode (Universal Code)
	Fixed Components (FC)
	Type Designator (TD) - Type Element (TE)
	Variable Meaning Function (VM_function)
	Language Forcing (LF)
	Linking Element (LE)
	User Defined Components (UDC)
	Basic Fixed Components
	Pictorial Representations (PR)
	Syntax versus Content
	Syntax Component
	UL Syntax
	Position Dependent Syntax (PDS)
	User Defined Syntax (UDS)
	Degree of Specificity
	Multiple Vocabularies
	Basic Translators (BT) and Filter Translators (FT)
	Some Initial Benefits of UL
	Steps in Creating UL
	Short and Long Term Goals For UL
	Fonts & Sounds
	UL Type Dictionaries (UTD)
	Host Language Interface
	Multi-lingual Vocabulary Issues

	/* New Sections */
	Neutrabet
	Phrase Templates
	Universal Language Parsing
`	Java Code

1.4 Abstract

	The existence of language and culture, creates an
enormous blockage for the World Wide Web. While we
communicate well internationally using pictures,
we have limited communications when we attempt to
build semantic bridges. In general, our approach to this
has been to try to force the other guy to learn our language.
This has had only limited success. Many people, quite
rightly, resent a vision of American as the only true
universal language. In many ways we find science fiction
images, like the Star Trek, universal language decoder,
much more appealing. This is because we know intuitively
that language is a central part of culture and should be
maintained, not destroyed.
	Various solutions have been offered. Probably
the most frequently discussed is the concept of translators.
In this image of reality, a black box device takes one
language and translates it into the other language.
This may include a variety of methods including:
	* text (L1) to text (L2)
	* speech (L1)  to text (L2)
	* text (L1) to speech (L2)
	* speech (L1) to speech (L2)

There are also intermediary plans
	* text (LI) to text (L2) to speech (L2)
	* speech (L1) to text (L1) to text (L2)
	* speech (L1) to text (L1) to text (L2) to speech (L2)

	These plans all have one thing in common. The idea
that bi-directional translation can occur between the various
languages. Unfortunately, this is not often the case. Many
concepts in one language do not translate easily to another
language. In addition the syntax of one language may result
in poorly formulated sentence when translated across the
language barrier. The most success in this area will result
when a single human being is used as an intermediary
between the two language participants. This translator
possesses knowledge of both languages. They use their natural
linguistic skills to provide approximations of the meaning in
the two languages.  Language translation also requires
sophisticated knowledge of the two languages. The translator
is often fluent in one language (their native language) and
semi fluent in the second language. This means that the
translation in one direction will have more quality than in
the reverse direction. For example, I have more vocabulary
available when I translate to English than when I translate
from English to one of my secondary languages.
	Is there an alternative solution. In this proposal,
it is suggested that an intermediary language is needed to
support real success in international linguistic communications.
The present plan requires our translators to go from  English to
German; English to Chinese; English to Japanese; English to
Vietnamese; then German to English; Chinese to English; etc. 
If we take the number of languages in the world and then
assume each will talk to each other language, the formula for
the number of required translators is:

N * (N-1) where is N is the number of languages.

	With ten languages, this comes to 90 translators. With
100 languages, this comes to 9900 translators.
	How can this be avoided? The solution is to provide a
universal language (UL) which acts as an intermediary between
each language. Each language must provide a translation to and
from the UL. This means that for each language there are two
translators, one from UL and one to UL. With 100 languages the
requirement is 200 translators rather than 9900.
	The use of a UL has a number of immediate advantages.
First, the UL can be constructed with a limited number of
concepts. Rather than providing all the nuances of every
language, the UL is restricted to a limited subset of concepts.
These can grow as new needs emerge. The UL also minimizes
the requirements for knowing who is going to receive a message.
An item can be posted in UL on the World Wide Web. The
browser is handed the job of translation from UL to the users
Host Language (HL). The UL to HL translator can be provided
as a plug-in, java applet or built in capability on the browser.
Email has similar advantages. A person wishing to send
important information between multiple international facilities
can do so without concern about recipient language base.
Rather than one translation for each language that will receive
the message, a single translation to UL will suffice.

1.5 TERMINOLOGY

1. BE - Base Element(s) - This is the Unicode along with 
        definitions that make up the UL. The Base Elements 
        can include Position Dependent Syntax  (PDS) in UL.  
        The Standard Base Elements are the UL used by all 
        users. SBE does not include Meaning Components
        for User Defined Components. Only the UDC place 
        holders are included.
2. FC - Fixed Component(s) - A Unicode component in the UL 
        that has a fixed meaning across languages.
3. G-UL - Generic Universal Language - The base of the UL. 
        The G-UL is distributed to all users along with 
        translations to and from all known HL.
4. HL - Host Language(s) - The native language which the 
        UL will translate to and from. Host languages are 
        normally our native languages (English, French, 
        Chinese, Japanese). However, there is nothing that 
        restricts a Host Language. It would be possible to 
        add a made up host language, such as Piglatin.
5. ILB - International Language Bridge - The name used to 
        designate the collection of all concepts, 
        requirements and other factors relating to the 
        creation, dissemination and use of a Universal 
        Language.
6. Jumbo - A collection of multiple UL Unicode that is 
        used frequently enough by a group that it justifies
        a UDC as a short cut.
7. LE - Linking Element(s) - Used to force a connection of
        a TD to a FC. This is needed only where there may 
        be confusion about which FC is being modified by 
        the TD. LE should rarely be needed. This is because
        the UL syntax will normally identify the required 
        logic.
8. LF - Language Forcing - A method to inline a specific HL.
        A LF will force the translator to use a specific 
        language whether it is or is not the users HL. 
9. MC - Meaning Component(s) - This is the global meaning
        that must be conveyed by a UL Group.
10. MV - Multiple Vocabularies - A method of providing Core
        Vocabulary along with Secondary Vocabularies (SV)
        to support quicker downloading and end user 
        specialization. The CV is distributed to all users.
        The SV  are provided only as needed.  MV may have 
        overlap, i.e. the same Unicode with the same MC may
        appear in more than one MV. However, when there is 
        overlap, the Meaning Component (MC) must be the same
        where the Unicode is the same. This differs from UDC.
11. PR - Pictorial Representation(s) - A method of creating 
        visual clues to the meaning of a UL Unicode.
12. SBE - Standard Base Element(s) - UL without UDC Meaning
        Components.
13. SC - Syntax Component(s) - A UL-G that is used to convey
        syntactic meaning, e.g. this is a question.
14. TD - Type Designator(s) - A Unicode component in the UL
        that is always followed by a variable field. TD are
        used for specific information with a range, e.g. 
        tall or short; amount of money; distance.
15. TE - Type Element(s) - Associated with a TD. The TE is
        the variable quantity relative to the specific TD.
        Multiple TE can be associated with a single TD. 
16. UDC - User Defined Component(s) - The UDC allows a group
        of users to customize the UL. This allows the core
        UL to be restricted.  A Generic UL provides the base 
        construct. The UDC adds the specialization. UDC are
        provided for each of the types available in UL.
17. UL - Universal Language - The general language used as
        an intermediary between all Host Languages.
18. UL-G - Universal Language Group(s) - A group of one or
        more UL Unicode that builds a  Meaning Component. 
        Many UL-Gs require only a single UL.  A UL-G that 
        is frequently used and which is created from 
        multiple may become a Jumbo Group. When this happens,
        it is a prime candidate for grouping into a UDC.
19. UL-Phrase - A UL Phrase is a group of UL-G that combine
        to create enough meaning to support all HL 
        translators. Depending on the context, a UL-Phrase 
        may be a little as a single Unicode or made up of 
        a string of UL-Gs. 
20. Unicode - Universal Code - A double byte code that 
        already exists in languages such as Java. Unicode 
        is provided to create a universal method that can
        provide any language, native or other on the world
        wide web and in programs. Unicode is a recognized 
        standard. 
21. VM_function - Variable Meaning Function(s) - A VM 
        function is used in conjunction with a Type 
        Designator. It provides a HL specific way to 
        create the variable meanings from the range of 
        Unicode.

1.5A New In Release 0.1

22. Language Parsing - The rules used to go between UL and NL.
23. Neutrabet - An abstract alphabet developed for creating words
        in the Universal Language.
24. Phrase Template - A method of determining what type of phrase
        is to be constructed from the UL when going the NL or 
        visa versa. 

1.6 UL Components

Unicode (Universal Code)

The Unicode standard allows 16 bit binary code to represent a
language. A language range in the Unicode standard is used to 
signify which language is being represented. 
There are numerous expansion ranges in the Unicode standard. 
UL can easily be fitted into this expansion. For the purpose
of this document Unicode references are shown as xx##, e.g. 
xx01 or xx07. The xx represents the language range identifier
which is not presently specified for UL. The ## represents 
specific numbers used to identify a specific member in the UL
Unicode.

While it is strongly suggested that UL eventually be added to
Unicode, this is not a necessary condition for beginning. One
can easily envision a mime or other method to identify UL in a
document. If this were the case UL could overlap Unicode. This
means that the initial development can be done in any integer 
range. 

	The unicode source is located at:
	http://www.unicode.org/	
	The required PDF files are at:
	http://www.unicode.org/charts/	

1.7 Fixed Components (FC)

A fixed component is one which has the same Meaning Component
(MC) in all contexts. A meaning component is not tied to a 
language. For example, "food" is a meaning component. It 
translated differently into different languages, but it has
the same meaning across the languages. 
While it may seem logical to equate Meaning  Component with
nouns in the English language. This is not necessarily the
case. For example "send an email" might be an useful MC. 
Other examples are:
	* call me by phone
	* available for a conference call

An FC should be broad. Specificity must be defined though 
UDC. So "computer" might make sense as an FC, "Dell Computer"
would not. Certain parts of language are not needed in UL.
For example words like "the, a or an" are not needed. 
These must be added by the HL translator. There are some FCs
that are needed but should only be available in a single 
form. For example negation uses multiple words in many 
languages. The rules for UL must be clear and universal with
respect to negation. Questions is another area where 
languages may vary significantly. A question is created with
a Syntax Component (SC).

1.8 Type Designator (TD) - Type Element (TE)

A type designator is used to signal that the next item is
of a particular type. Typical type designators might include:

	* Money/Quantity
	* Size/Unit
	* Emotional Component 
	* Proper Noun
	* Address
	* Phone Number
	* Temporal/ Past Present Future

	A Type Designator is followed by a piece of 
information called the Type Element. The TE may vary across 
an extremely large range. For example peoples names may vary
enormously. The UL to HL translator must provide a method of
translating the Variable Information (VI) that follows the TD.
For some instances, this is extremely simple. For example a
unit conversion from HL dollars to UL monetary units could
exist for each language and monetary unit. The same is true
of phone numbers that could translate from English numbers 
to UL numbers. A Chinese client would receive the UL numbers
and translate them into Chinese numbers.
	Each type designator will also include a NULL value
option. This will be used to indicate that value is either
unavailable in the HL that created the content, or the value 
is unspecified in this context. The UL to HL translator will
provide a reasonable translation for the NULL value. 
	It is definitely worth noting that UL provides a
special TE to support verb tense or temporal relationship.
When the temporal TE is used, there may be two or more TE 
used in series. This allows verb conjugation to be handled
with minimal UL Unicode. A verb (TD) might have two (or more)
TE associated with it. This allows "run, ran, will run, has
run, etc." to all be done with a TD-TE. In addition a second
TE (TD-TE-TE) will change this from above to "walk, walked,
has walked, will walk, etc.". The generic TD is state of 
motion of a person. The conjugation TE changes the temporal
aspects, while the relative TE changes the state from run 
to walk. With this strategy, numerous English language words
are produced with only three Unicode.  This should also be 
true for other languages. The critical part of UL becomes 
the identification of the fundamental Meaning Components (MC).

1.9 Variable Meaning Function (VM_function)

A Variable Meaning Function is a language based rule for 
adding the variable meaning associated with the TE part of
the TD-TE pair. Typical examples of VM_function(s) are:

* Numbers - a method of producing a Host Language (HL) 
      specific interpretation of the numeric values from 
      the TE.
* Direction - a method of producing a HL specific 
      interpretation of direction, e.g. north, south, up,
      down, etc.
* Emotion - a method of producing a HL specific 
      interpretation of emotional state, e.g. good, bad,
      terrific, etc.
* Temperature - a method of producing a HL specific 
      interpretation of temperature, e.g. hot, cold, 
      freezing, etc.

2.0 Language Forcing (LF)

In some instances language overriding might be needed. 
For example, a monetary web sight might want all 
transactions represented in dollars even when the 
surrounding information is translated. 

2.1 Linking Element (LE)

	A linking element is used to connect a type 
designator (TD) to a fixed element (FE). LE are used when
confusion can result as to which FE is being modified. It
is unclear at this time if Linking Elements are really
needed. A properly designed language syntax may remove the
need for LE.

2.2 User Defined Components (UDC)

	User defined components are placed in a local
vocabulary located on the client. For example, Gates 
Rubber might use the word "rubber" frequently during emails, 
while Coca Cola might use "beverage". The UDC are defined 
prior to communications. In the case of a Web page, it 
would be the job of the Web page provider to produce a list of
UDC used on the Web page. These could be downloadable either 
prior to access or during access. The client would announce
it language to the server so the correct subset of UDC would
be provided.  A UDC may be a full concept rather than a
single word. Return to Coca Cola as a potential user, the 
concept "sell Coca Cola" could be a single UDC. With several
fixed components it would be possible to get

Do we have a good ad to sell Coca Cola
Can you sell Coca Cola in China?
What do we need to do to increase our ability to sell Coca Cola?

2.3 Basic Fixed Components

	The basic fixed components must be selected to provide
the most common content used in email and Web pages.  We begin
by providing general categories that should be universal across
languages.

* Objects - Stationary things in our world.  These are in 
      general what the English language calls nouns, but 
      they are not all nouns. For example in the sentence 
      "The love that I feel.", the word "love" is a noun, 
      but it is not an object. An le rule would be all type 
      designators must precede the basic element they modify. 
      On a computer screen this would allow large ball to map 
      out a space prior to placing the ball on the screen. 
      The same is true for right ball (ball to the right).
      The difficulty arises with red ball. In this instance 
      the ball needs to be placed on the screen before the 
      red is added. It is unclear whether link elements should
      be used here or store and wait logic in the language 
      translators. 

2.6 Syntax Component

	In UL, a Syntax Component (SC) is a Unicode used to 
convey syntactical meaning is a statement. The most clearly 
defined SC is the question component. An SC->Question will 
make a UL statement into a question. The location of SCs will be 
a defined part of the UL language. UL will not contain some of 
the normal syntax found in languages. For example, there will
not be commas.  A period SC may be needed to identify completion
of a UL sentence. This is an open issue. Since UL can be embedded
in other languages, e.g. HTML, it will allow for syntax 
transitions from outside the UL language. For example, an HTML
table of UL information could be sent via email.

2.7 UL Syntax

	The language of UL will have strict syntax rules. The 
basic unit is Unicode. A group of Unicode will produce a phrase.
Each phrase creates a phrase meaning, i.e. it should provide 
enough information to create an acceptable statement in any HL. 
The set of all phrases creates a message. It is also possible
to have multiple messages provided the messages are embedded 
in another language, e.g. multiple messages in UL embedded in 
HTML.

2.8 Position Dependent Syntax (PDS)

	Position Dependent Syntax (PDS) allows UL to use the 
same Unicode to produce different Meaning Components. This 
occurs when the Unicode is a TD-TE combination. For example 
the same Unicode can be used for degree of heat and cold as 
are used for amount of money. In the former case there is a 
temperature TD while in the later there is a money TD. They 
are both followed by a Unicode with a possible range of values. 

	TD-TE elements will be created to provide a reasonable
range of values. Where more resolution is needed, the User 
Definable Components (UDC) will be used. Linking Elements can 
also provide support for these specialized applications.

2.9 User Defined Syntax (UDS)

The availability of User Definable Components (UDC) allow a 
specialized need to define a syntax independent of UL. These 
applications will be allowed but will not be supported by the 
UL development team.

3.0 Degree of Specificity

	Languages are subtle by nature. They allow us to 
express things in a rich variety of ways. As stated earlier,
the UL should be simple, forcing the translators to add 
richness to the translations. As a simple example, "run, walk
and stand" are three different words in the English language. 
But in UL they could be expressed as three states of a person 
in motions. The first is Motion + large magnitude. The second is 
Motion + minimum magnitude. The third is motion plus zero 
magnitude. If we add to this a Unicode for road, a number of 
sentences can be created. 
	* walking on the road
	* running on the road
	* standing on the road
	* on the road
	* moving down the road

	Of course, if moving on roads was a common part of a
groups communications, it would be possible to create a User 
Defined Component (UDC) to represent movement on a road. This 
brings out an important point. A UDC can be created by 
transferring a language specific definition to the end users 
or it can be created by combining identifiable UL components. 
In the latter case, the UDC is referred to as a Jumbo UDC. 

Case 1: UDC#1 => HL#1 -> "text in host language #1 here"
          : UDC#1 => HL#2 -> "text in host language #2 here"
Case 2: UDC#2 => UL -> xx01 xx03 xx17 ... (sequence of UL 
          codes produce a jumbo)

3.1 Multiple Vocabularies

	UL should support Multiple Vocabularies (MV). Rather
than having every UL to HL translator support every possible 
UL Unicode, it should be possible to define sub-vocabularies
that are application specific. For example, there could be a 
set of highly common codes. Then there could be a second set 
of business codes and a third set of sports codes. A person 
wishing to send an email about business would load the common 
codes and the business codes. For sports discussions load the 
common codes and sports codes. For a business that provides 
sport equipment, load all three. When possible, which codes 
to load can be sent along with the message. . 
	MV will consist of a Core Vocabulary (CV) and many 
Secondary Vocabularies (SV). There is nothing to stop the MV 
from having the same UL-G. However, where there is overlap 
between MV, the Meaning Component (MC) must be the same for 
overlapping elements. Only UDC (User Defined Components) can 
have a different Meaning Component for the same Unicode. UDC 
are not a part of the Multiple Vocabularies. They are an 
independent area of the UL that is fixed in size and available
at all times to the users and translators

3.2 Basic Translators (BT) and Filter Translators (FT)

	A Basic Translator (BT) takes content to or from a HL
(Host Language) and translates to or from UL. A translator will
have no knowledge of embedding in other languages. A Filter 
Translator (FT) will include knowledge of other languages in 
addition to UL. Some possible FT are:

* EMAIL FT - Leaves in place email headers needed for transfer. 
     Replaces other content either UL-HL for HL-UL.
* HTML FT - Leaves in place HTML identifiers. Replaces other 
     content either UL-HL or HL-UL.
* VIDEO TEXT FT - Text output available along with TV is now 
     common practice. It is used for deaf people, elderly 
     people or to support noisy environments (e.g. aerobics 
     classes). Using UL, it would be possible to support 
     language independence. The Video Text would be available 
     in multiple languages.
*  MOVIE TEXT FT - Movies require translators to provide 
     subtitles in different languages. A Movie Text FT could 
     allow a single HL, e.g. English, to be translated through 
     UL to other languages. This would allow wider movie 
     distribution around the world.

	With a properly created translator (TR) a person should 
be able to take an email in the HL, put it through the HL to UL 
email filter, then send it to a foreign recipient. The far end 
recipient would then use a UL-HL Filter Translator (FT) to go 
from UL to their native host language. 

3.3 Some Initial Benefits of UL

* With UL a multi-language Web site can be provide with a single 
    UL site.
* UL provides a more user friendly interface for international 
    email.
* With UL it is possible to take an existing Web site in any HL 
    and run it through an HL for language #1 to UL filter . It 
    can then be passed through a second UL to HL for language #2. 
    The Web page will then appear in the users native language.
* With UL it is possible to take an existing email in any HL 
    and run it through an HL for language #1 to UL filter. 
    It can then be passed through a second UL to HL for 
    language #2. The email will then appear in the users native 
    language.

3.4 Steps in Creating UL

1. Identification of high level objects - This is being worked 
   in the present document.
2. Identification of HL language expectations - It may not be 
   possible to do all languages for a prototype. It should be 
   possible to do a subset. This subset should include languages 
   from each of the major continents. For example, a good subset 
   might be English, Chinese, Japanese, Russian, Spanish, Arabic. 
   It is important that the initial work not focus on European 
   only languages because they have a common syntactic structure, 
   use the Arabic alphabet and have numerous other commonalties. 
3. Architecture of the UL language. This will require a team of 
   linguistic and computer experts. The goal will be to 
   conceptualize the needed structure to provide a common 
   international base.  The architectural analysis will also try 
   to identify the best locus for initial prototyping. For example 
   should it be browser based, server based, integrated into HTTP? 
   Should it be developed as a Java API or as a browser plug-in. 
   Will it initially be a runtime component or less interactive? 

          These and other questions can be worked by the software 
 segment of the architecture team. The linguistic team will focus 
on aspects specific to the language. How will negation operate? 
Where will a question operator reside? What constitutes a Meaning 
Concept across the various languages? How will base and filter 
translators operate? What is the core UL? What types of vocabularies
should be supported?

4. Identification of the Base Elements (BE).  The basic core 
   language will need to be identified. Next there will be a need 
   to prototype it and determine how well it translates between 
   different languages. While the original work can be done with a 
   limited expectation, at some point there will be a need to 
   determine how easily it expands. Finally the specialization 
   stages using UDC needs to be tested. 

3.5 Short and Long Term Goals For UL

Here are some of the possible short and long term uses for a Universal
Language:

1. Email - The use of email represents the most frequent method
of communications on the Internet. The number of emails sent 
daily has multiplied at an astronomical rate. As global markets
continue to grow a method of rapid translation of email content
will greatly expand our global commitments.
2. Web Pages - HTML content has now become an enormous asset for
providing information to the public. The ability to provide
multilingual content through a single translation mechanism 
using UL will greatly expand the dissemination of content. This
can have important advantages for world wide global 
responsibility. It can mean that international programs in space 
exploration, telecommunications, medicine, etc.  can share 
content without language constraints.
3. Firewall Content - Many multinational companies now provide
information behind the safety of firewalls. Since they often
use standard mechanisms available on the World Wide Web, the 
use of UL should be invaluable to their private international
networking. 
4. Other Document Format - The basic mechanisms provided by UL
should be expandable to other document formats. These can
be public format, e.g. PDF and Word or more private formats 
such as Framemaker. The UL translations mechanisms would work
equally well with attachable documents as with standard 
interchange mechanisms.

5. Computer Languages - UL can provide significant gains for computer
languages attempting to build language independence. UL is
built on Unicode. This code has been developed to provide multiple 
language capability. Java is one example of a language that has done
extensive work on Unicode usage.
6. Intermediary For Text to Speech and Speech to Text - While
UL is a Unicode textual language, it can be used as an intermediary
for the transition from one language to another language in any 
format. This includes speech, Braille or any other media used to 
communicate language content. As the state of the art continues 
to grow, UL could potential serve as the mechanism for real time 
audio translation. This means it could be use in numerous
disciplines including the replacement of movie subtitles, 
translation of public speeches, etc. While this is clearly a long 
term goal, the objective is not unreasonable. Setting the stage 
for future growth in language translation will clearly serve many
potential growth areas.  

3.6 Fonts & Sounds

	The Universal Language has no external representation. It
exists only as Unicode. Unlike other languages that require 
character representation, e.g. Kanji, UL is an intermediate 
language. UL has no sound representation. Again this differs from 
other languages where pronunciation and confusion due to accents
is a critical concern. 
The fact that UL exists only as an intermediate language has 
significant advantages. Translation rules need only apply to 
meaning. By providing strict syntactic rules for UL, translations 
will require minimal risk of misinterpretation.
The primary difficulty will be in determining the linkage from UL to 
the Native Languages.


3.7 UL Type Dictionaries (UTD)

	Each of the Unicode types in UL are grouped together into a UL
type Dictionary. The UTD required to support UL are:

* UTD.FC - The Fixed Component dictionary consisting of all Fixed
Component objects in UL.
* UTD.TD - The Type Designator dictionary consisting of all Type
Designator objects in UL
* UTD.LE - The Linking Element dictionary consisting of all Linking
Element objects in UL.
* UTD.UDC - The User Defined Component dictionary consisting of
all User Defined Component objects in UL.
* UTD.UL-G - The Universal Language Group dictionary consisting
of all Universal Language Groups objects in UL. 
* UTD.LF - The Language Forcing dictionary consisting of all Language
Forcing objects in UL.
* UTD.SC - The Syntax Component dictionary consisting of all Syntax
Component objects in UL.
* UTD.Jumbo - The Jumbo dictionary consisting of all Jumbo objects
in UL.

3.8 Host Language Interface 

	The Unicode interpretation of Host Languages will be used to
provide a common language base. For example the English equivalent 
of "cow" is Unicode XX XX XX.  The same meaning in Chinese is "xiahu"
which is Unicode XX XX.  Note that the English cow takes three 
Unicode while the Chinese requires only two. While 
UL requires only one. All three require the Unicode fall in the
range for the specified language, i.e. HL or UL. 

3.9 Multi-lingual Vocabulary Issues

	A serious issue for UL is the amount of vocabulary required and
the nature of Host Language vocabulary. The number of words in the 
English language is enormous. 
Other languages share these large sizes. While much of the vocabulary
is common across the languages, e.g. many nouns, there are places where
vocabulary will result in high degrees of complexity. Some examples 
of problem areas are:

* Words which have multiple meanings in one language but only
a single meaning in a second language. For example, "HOT" is 
an English word for a temperature state and in food for a spicy state. 
In Spanish, these are two separate words. 
* Words which may have other nuances in one language that do not
exist in another language. In Chinese, there are different words
for "first son versus second  or greater son". The "first son" has 
special responsibilities that do no exist in other cultures. Other 
relatives are also viewed differently than among European cultures. 
* Words which are taken from other languages. The word "email"
is used by other cultures. However, it is frequently pronounced 
using the characteristics of the host language. 

A number of tools are available in UL to solve these issues. These
include:

* UDC - the User Defined Vocabulary - Allowing areas where languages
significantly diverge to be covered on a case by case basis.
* LF - the use of language forcing - Allowing a HL to be forced
into use, e.g. Sputnik.
* TD-TE - allowing relative aspects of a language to be included,
e.g. the various Chinese relationships could include TE values. A
more complex Host Language output would be needed in English to 
fully explain what is meant.


4.0 Neutrabet

In order to form Meaning Components in the Universal Language,
a Neutrabet similar to an alphabet will be developed. The 
Neutrabet will have the following characteristics:

1. A termination Unicode, temporarily designated at XX00. 
    Usage of the termination will explained shortly.
2. Ranges for different types of Meaning Codes. As an initial
    example, consider concrete objects. These may be 
    represented by N Unicode. The value of N is calculated 
    using the following formula.
	Let S sub 1 = the subset of Unicode used to create 
                      all concrete objects.
        Let N sub 1 = the number of concrete objects to be 
                      represented
	Let n sub 1 = the number of Unicode required to 
                      represent N sub 1 + 1 (XX00)

	Then  

        N sub 1 <= Sum from x =1 to n sub 1 
         (Permutations of n sub 1 Unicode taken (x at a time) 
		with replacement.

	Here is a table of progression

	n sub 1             N sub 1
            1                1
            2                2**1 + 2**2 = 6
            3                3**1 + 3**2 + 3**3 = 
                               3 + 9 + 27 = 39
	    4                4**1 + 4**2 + 4**3 + 4**4 = 
                               4 + 16 + 64 + 256 = 340
	   10                11,111,111,110 
	
	Clearly 10 characters may be adequate to represent 
all concrete objects that are of interest. The reason that 
these numbers go up so astronomically with the Neutrabet is 
the absence of any need to be aware of sounds or readability. 
The Neutrabet is purely abstract. If we think in terms of the
first three letters of the English alphabet, remembering the
use of the termination character, these would be the allowable
Meaning Concepts:
A B C
 AA AB AC BA BB BC CA CB CC
 AAA AAB AAC ABA ABB ABC ACA ACB ACC
BAA BAB BAC BBA BBB BBC BCA BCB BCC
CAA CAB CAC CBA CBB CBC CCA CCB CCC

The use of the End Character ( XX00 )

As has been pointed out by others, the neutrabet can be formed
in other ways, e.g 10 things taken 11 at a time. The example
is merely done to illustrate how few unicodes would be needed
to create a full Universal Language.

5.0 Phrase Templates
	In order to implement the Universal Language, the concept
of a Phrase Template (PT) is introduced. The idea of a PT is to
have strict linguistic rules for transformation from UL to HLs.
The Phrase Template states the order in which the HL should be 
unwrapped by the Language Parser to form the HL or visa versa. 
	Here are some basic Phrase Templates. This discussion 
is English based, hopefully there are similar templates in other 
languages. 
¸ Simple Multi-word Phrase
	examples:
	hello, goodbye, what, where
¸ Subject Verb Object (basic)
	examples:
	I <am> go<ing> <to the> store
¸ Basic Question
	example:
	what's that?
	who's hungry?

The UL languages development requires that Phrase Templates be
provided that can map from UL to NL and visa versa. Phrase 
Templates allow UL to avoid the concern of the infinite variety 
available in languages. The templates provide restrictions on
the types of meaning transfers that can be done. Obviously a 
goal for UL is to have the Phrase Templates become all inclusive
at some point. But this does not have to be the case to get started. 

6.0 Universal Language Parsing

The primary types of Parsing are, UL to HL and HL to UL. These
needed to be examined separately.

6.1 UL To HL Parsing

This is an interative process consistion of the following steps:

1. Retrieve the UL Phrase Template. This is the first part
   of the UL phrase. 
2. Host Language determines the HL Template that will change
   the UL to HL. 
3. Retrieve the content UL Unicode.
4. Map it from the UL Phrase Template to the HL Phrase Template
   using UL Phase Template to HL Phrase Template mapping rules.

Here is a simple example

1. UL Phrase Template is a simple multi-word phrase
2. HL Phrase Template is a one to one translation
   from UL content to HL content.
3. UL content is a greeting
4. HL content English is "Hello"
   HL content Spanish is "Hola"
   HL content Chinese is "Ni Hao Mah"
   HL content French is "bonjour"
   

Obviously this is a simple example, but it illustrates the basic
principle. 

7.0 Java Prototype

	The following code is mean to be an initial version of
the Universal Language. It is provided here to start the process
of developing the Universal Language. 

7.1 Ordering Of Meaning Components
 	In the Java code below, the vocabulary is ordered in
alphabetical order based on English. This is an absolutely 
incorrect representation. It is done with major appologies. At
some point a more meaningful order needs to be established. 
This would focus on universal characteristics across languages.
Here are some possible ordering principles:

1. Level of concreteness over time, e.g. ball is more concrete
   than audience since the later requires more context.
2. Visible size and mobility - Here ball become ordered before
   star because while both are visible, the ball can be moved.
   Note that atom, which is extremely small, fall after both.
   So size does not mean larger or smaller.
3. Complexity of context, e.g. audience precedes botonist since
   the latter requires long term context based on years of 
   study of a specific discipline.
4. Similarity of meaning, e.g. ball and sphere might be grouped
   together. 

Note that all these ordering principles will have validity 
     across languages. 

/************** compile using javac compiler ********/
/*
 * To test initial java code
 * Seperate files by searching for filename: and end:
 * compile all code using javac compiler
 * run 
 * java phrase
 */

/* Filename: phrase.java
   javac phrase.java
 */
class phrase extends dictionary {
	phrase(int[] ucode)
	{
		if(ucode[0]==1)
		{
			phrase_template=1;
		}
	}

	public int get_pt() { return phrase_template; } 


public static void main( String[] args )
{
	int[] ucode_tst = { 1,2,3 };
        phrase p = new phrase( ucode_tst );
	System.out.println("PT is " + p.get_pt());
	System.out.println("word is " + p.get(0));
} 

	private int phrase_template = 0;
}
/* End: phrase.java */


/* Filename: phrase_templateINTF.java
   javac phrase_templateINTF.java
 */
interface phrase_templateINTF {

	final int basic    = 1;
	final int subj_obj = 2;
}
/* End: phrase_templateINTF.java */


/* Filename dictionary.java
   javac dictionary.java
 */
import java.util.Vector;
import mc_chinese_INTF;
import mc_english_INTF;
import mc_spanish_INTF;

class concrete_word {

	void add(char[][] unicode)
	{
		cw = unicode;
	}

	String get( int which )
	{
		 return new String( cw[which], 0, cw[which].length);
	}


	char cw[][];
	int count;
};

class concrete_dict {
	public void add(char[][] concrete)
	{
		dict.addElement( concrete );
	}
	private Vector dict = new Vector();
}

class dictionary 
	implements mc_chinese_INTF,  
	mc_spanish_INTF,  
	mc_english_INTF  
{

	concrete_word cw;
	concrete_dict dw;

	dictionary()
	{
		cw = new concrete_word();
		dw = new concrete_dict();
		cw.add(mc_english_INTF.w1);
	}
	String get( int which)
	{
		return cw.get(which);
	}

public static void main( String[] args )
{
	dictionary t = new dictionary();
	// Test
	System.out.println( "Unicode is " + t.get(0));
	System.out.println( "Unicode is " + t.get(1));
} 
}

/* End: dictionary.java */

/* filename: mc_chinese_INTF.java
   javac mc_chinese_INTF.java

	This file is only partially complete. Unicodes with
	zero have not yet been located. Some of the others
	are only best approximations from the ones located
	in CJK 4E00-9FAF
	The unicode source is located at:
	http://www.unicode.org/	
	The required PDF files are at:
	http://www.unicode.org/charts/	
 */
interface mc_chinese_INTF {
final char[][] w1 = { 
	{ '\u7a7a', '\u6c14' }, // air
	{ '\u900b', '\u900d' }, // aisle
	{ '\u500c', '\u5929', '\u7bc0' }, // albatros
	{ '\u682e', '\u0069', '\u5405' }, // album
	{ '\u6d12', '\u682f' }, // alcohol
	{ '\u57fe', '\u0069', '\u0072', '\u0071' }, // alcove 
	{ '\u5515', '\u6e6d'  }, 	// ale
	{ '\u4ee3', '\u6573'  }, 	// algebra
	{ '\u0063', '\u0069', '\u0072', '\u0071' }, // alien
	{ '\u0063', '\u0069', '\u0072', '\u0071' }, // alimony
	{ '\u78e9' }, // alkali
	{ '\u77ed', '\u543b' }, // aligator
	{ '\u5408', '\u51ce' }, // alloy
	{ '\u5386' }, // almanac
	{ '\u674f', '\u6838' }, // almond
	{ '\u9ad8', '\u5c71' }, // alp
	{ '\u5b57', '\u0000', '\u5700' }, // alphabet
	{ '\u0063', '\u575b' }, // altar
	{ '\u0000' }, // aluminum
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // amazon
	{ '\u5927', '\u4f52' }, // ambasador
	{ '\u0000', '\u62a4' }, // ambulance
	{ '\u0000', '\u6676' }, // amethyst
	{ '\u5b89', '\u57f9' }, // ammeter
	{ '\u6c26', '\u6729' }, // ammonia
	{ '\u519b', '\u706b' }, // ammunition
	{ '\u0000', '\u0000' }, // amoeba
	{ '\u5b89', '\u57f9' }, // ampere
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // amphibian
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // ampitheatre
	{ '\u87d2', '\u867c' }, // anaconda
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // anaesthesia
	{ '\u5929' }, // angel
	{ '\u752a' }, // angle
	{ '\u52a8', '\u60da' }, // animal
	{ '\u8e19' }, // ankle
	{ '\u5e74', '\u91d2' }, // annuity
	{ '\u8862', '\u868a' }, // ant
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // antarctic
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // antelope
	{ '\u5929', '\u7ebf' }, // antenna
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // anthem
	{ '\u8bd7', '\u96c6' }, // anthology
	{ '\u0063', '\u0069', '\u0072', '\u0071' }, // anthropoid
	{ '\u4eba', '\u5940', '\u5b66' }, // anthropology
	{ '\u62ad', '\u83cc' }, // antibiotic
	{ '\u62ad', '\u4f53' }, // antibody
	{ '\u62ad', '\u51bb', '\u5242' }, // antifreeze
	{ '\u53e4', '\u5743' }, // antique
	{ '\u961e', '\u5ed4', '\u5242' }, // antiseptic
	{ '\u5ed8', '\u752a' }, // antler
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // anus
	{ '\u0063', '\u0069', '\u0072', '\u0071' }, // anvil
	{ '\u738b', '\u0000', '\u6629' }, // aorta
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // apartment
	{ '\u0063', '\u0069', '\u0072', '\u0071' }, // ape
	{ '\u5e75', '\u7555', '\u6d12' }, // apertif
	{ '\u0063', '\u0069', '\u0072', '\u0071' }, // aperture
	{ '\u4fa0', '\u5f92' }, // apostle
	{ '\u0000', '\u0000' }, // apostrophe
	{ '\u4eea', '\u5650' }, // apparatus
	{ '\u9611', '\u5c3e', '\u708e' }, // appendicitus
	{ '\u0063', '\u0069', '\u0072', '\u0071' }, // appendix
	{ '\u0063', '\u0069', '\u0072', '\u0071' }, // appetite
	{ '\u0063', '\u0069', '\u0072', '\u0071' }, // appetizer
	{ '\u82f9', '\u0069' }, // apple
	{ '\u5189', '\u5177' }, // appliance
	{ '\u7533', '\u8bf7', '\u4eba' }, // applicant
	{ '\u7533', '\u8bf7', '\u838d' }, // application
	{ '\u5b66', '\u0000', '\u0000' }, // apprentice
	{ '\u593b', '\u6811' }, // apricot
	{ '\u56f4', '\u6870' }, // apron
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // aqualung
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // aquamarine
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // aquarium
	{ '\u8c21', '\u6a79' }, // aqueduct
	{ '\u5f27' }, // arc
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // arcade
	{ '\u5927', '\u4e3b', '\u72e1' }, // archbishop
	{ '\u5f13', '\u0069', '\u4e8e' }, // archer
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // archipelago
	{ '\u8bbe', '\u8ba1', '\u5202', '\u4e21' }, // architect
	{ '\u0063', '\u0069', '\u0072', '\u0071' }, // archives
	{ '\u62f1' }, // archway
	{ '\u0000', '\u0000' }, // area
	{ '\u0000', '\u6280', '\u573a' }, // arena
	{ '\u548a', '\u53f9' }, // aria
	{ '\u601d', '\u0069', '\u0072', '\u0071' }, // aristocracy
	{ '\u601d', '\u0069', '\u0072', '\u0071' }, // aristocrat
	{ '\u7bb3', '\u672f' }, // arithmetic
	{ '\u4e07', '\u821f' }, // ark
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // arm
	{ '\u0000', '\u0000' }, // armada
	{ '\u72b0', '\u72f3' }, // armadillo
	{ '\u9ccc', '\u7532' }, // armour
	{ '\u0000', '\u0000' }, // arms
	{ '\u9646', '\u519b' }, // army
	{ '\u7bad' }, // arrow
	{ '\u706b', '\u836f', '\u538d' }, // arsenal
	{ '\u0000', '\u0000' }, // arsenic
	{ '\u65c0', '\u706b' }, // arson
	{ '\u0063', '\u672f' }, // art
	{ '\u0000' }, // artefact
	{ '\u0063', '\u8112' }, // artery 
	{ '\u592b', '\u0069', '\u0072', '\u0071' }, // arthritis
	{ '\u0063', '\u0069', '\u0072', '\u0071' }, // artichoke
	{ '\u0063', '\u0069', '\u0072', '\u0071' }, // article
	{ '\u4eba', '\u58ec', '\u54c1' }, // artifact
	{ '\u5927', '\u70ae' }, // artillery
	{ '\u4e8e', '\u0000', '\u4eba' }, // artisan
	{ '\u827a', '\u672f' }, // artist
	{ '\u53f3', '\u68c9' }, // asbestos
	{ '\u4e0a', '\u5347' }, // ascension
	{ '\u4e0a', '\u5347' }, // ascent
	{ '\u6849', '\u6811' }, // ash
	{ '\u0000', '\u0000' }, // asphalt
	{ '\u82a6', '\u7b0b' }, // asparagus
	{ '\u6837', '\u5b50' }, // aspect
	{ '\u0000', '\u0000' }, // aspirin
	{ '\u0000', '\u0000' }, // ass - animal
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // assassin
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // asterisk
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // asteroid
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // asthma
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // astrology
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // astronaut
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // astronomy
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // asylum
	{ '\u4f53', '\u0000' }, // athlete
	{ '\u5730', '\u56fe', '\u96ec' }, // atlas
	{ '\u5927', '\u6c14' }, // atmosphere
	{ '\u0000', '\u0000' }, // atol
	{ '\u539f', '\u5b50' }, // atom
	{ '\u0000', '\u0000' }, // attache
	{ '\u0000', '\u0000' }, // attic
	{ '\u5f8b', '\u0000' }, // attorney
	{ '\u6fcf', '\u0000' }, // auction
	{ '\u5927', '\u80c6' }, // audacity
	{ '\u5426', '\u0000' }, // audience
	{ '\u8ba1' }, // audit
	{ '\u8bd5', '\u542c' }, // audition
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // auditorium
	{ '\u59d1', '\u0000' }, // aunt
	{ '\u4f5c', '\u5bb6' }, // author
	{ '\u6743', '\u529b' }, // authority
	{ '\u81ea', '\u4f20' }, // autobiography
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // autocrat
	{ '\u58ec', '\u8ff9' }, // autograph
	{ '\u8f66' }, // automobile
	{ '\u79cb', '\u5b63' }, // autumn
	{ '\u0000', '\u0000', '\u0000', '\u0000' }, // autopsy
	{ '\u0000', '\u0000' }, // avalanche
	{ '\u6797', '\u836b' }, // avenue
	{ '\u822a', '\u5ba4' }, // aviation
	{ '\u0000', '\u0000' }, // avocado
	{ '\u6388', '\u4e88' }, // award
	{ '\u65a7' }, // axe
	{ '\u516c', '\u7406', }, // axiom
	{ '\u8f74' }, // axis
	{ '\u8f6e', '\u8f74' }, // axle

		};

}
/* end: mc_chinese_INTF.java */

/* Filename: mc_spanish_INTF.java
   javac mc_spanish_INTF.java
 */
interface mc_spanish_INTF {

final char[][] w1 = { 
	{ '\u0063', '\u0069', '\u0072', '\u0065' },  // air
	{ '\u006e', '\u0061', '\u0076', '\u0065' },  // aisle
	{ '\u0061', '\u006c', '\u0062',
		'\u0061', '\u0074', '\u0072', '\u006f',
		'\u0073' },           // albatros
	{ '\u0061', '\u006c', '\u0062',
		'\u0075', '\u006d' }, // album
	{ '\u0061', '\u006c', '\u0063', '\u006f',  
	  '\u0068', '\u006f', '\u006c' 
			},           // alcohol
	{ '\u006e', '\u0069', '\u0063', 
		'\u0068', '\u006f' },  // alcove	
	{ '\u0000' }, 	// ale
	{ '\u0000' }, 	// algebra
	{ '\u0000' }, // alien
	{ '\u0000' }, // alimony
	{ '\u0000' }, // alkali
	{ '\u0000' }, // aligator
	{ '\u0000' }, // alloy
	{ '\u0000' }, // almanac
	{ '\u0000' }, // almond
	{ '\u0000' }, // alp
	{ '\u0000' }, // alphabet
	{ '\u0000' }, // altar
	{ '\u0000' }, // aluminum
	{ '\u0000' }, // amazon
	{ '\u0000' }, // ambasador
	{ '\u0000' }, // ambulance
	{ '\u0000' }, // amethyst
	{ '\u0000' }, // ammeter
	{ '\u0000' }, // ammonia
	{ '\u0000' }, // ammunition
	{ '\u0000' }, // amoeba
	{ '\u0000' }, // ampere
	{ '\u0000' }, // amphibian
	{ '\u0000' }, // ampitheatre
	{ '\u0000' }, // anaconda
	{ '\u0000' }, // anaesthesia
	{ '\u0000' }, // angel
	{ '\u0000' }, // angle
	{ '\u0000' }, // animal
	{ '\u0000' }, // ankle
	{ '\u0000' }, // annuity
	{ '\u0000' }, // ant
	{ '\u0000' }, // antarctic
	{ '\u0000' }, // antelope
	{ '\u0000' }, // antenna
	{ '\u0000' }, // anthem
	{ '\u0000' }, // anthology
	{ '\u0000' }, // anthropoid
	{ '\u0000' }, // anthropology
	{ '\u0000' }, // antibiotic
	{ '\u0000' }, // antibody
	{ '\u0000' }, // antifreeze
	{ '\u0000' }, // antique
	{ '\u0000' }, // antiseptic
	{ '\u0000' }, // antler
	{ '\u0000' }, // anus
	{ '\u0000' }, // anvil
	{ '\u0000' }, // aorta
	{ '\u0000' }, // apartment
	{ '\u0000' }, // ape
	{ '\u0000' }, // apertif
	{ '\u0000' }, // aperture
	{ '\u0000' }, // apostle
	{ '\u0000' }, // apostrophe
	{ '\u0000' }, // apparatus
	{ '\u0000' }, // appendicitus
	{ '\u0000' }, // appendix
	{ '\u0000' }, // appetite
	{ '\u0000' }, // appetizer
	{ '\u0000' }, // apple
	{ '\u0000' }, // appliance
	{ '\u0000' }, // applicant
	{ '\u0000' }, // application
	{ '\u0000' }, // apprentice
	{ '\u0000' }, // apricot
	{ '\u0000' }, // apron
	{ '\u0000' }, // aqualung
	{ '\u0000' }, // aquamarine
	{ '\u0000' }, // aquarium
	{ '\u0000' }, // aqueduct
	{ '\u0000' }, // arc
	{ '\u0000' }, // arcade
	{ '\u0000' }, // arch
	{ '\u0000' }, // archaelogy
	{ '\u0000' }, // archangel
	{ '\u0000' }, // archbishop
	{ '\u0000' }, // archer
	{ '\u0000' }, // archipelago
	{ '\u0000' }, // architect
	{ '\u0000' }, // archives
	{ '\u0000' }, // archway
	{ '\u0000' }, // actic
	{ '\u0000' }, // area
	{ '\u0000' }, // arena
	{ '\u0000' }, // aria
	{ '\u0000' }, // aristocracy
	{ '\u0000' }, // aristocrat
	{ '\u0000' }, // arithmetic
	{ '\u0000' }, // ark
	{ '\u0000' }, // arm
	{ '\u0000' }, // armada
	{ '\u0000' }, // armadillo
	{ '\u0000' }, // armour
	{ '\u0000' }, // arms
	{ '\u0000' }, // army
	{ '\u0000' }, // arrow
	{ '\u0000' }, // arsenic
	{ '\u0000' }, // arsenal
	{ '\u0000' }, // arson
	{ '\u0000' }, // art
	{ '\u0000' }, // artefact
	{ '\u0000' }, // artery 
	{ '\u0000' }, // arthritis
	{ '\u0000' }, // artichoke
	{ '\u0000' }, // article
	{ '\u0000' }, // artifact
	{ '\u0000' }, // artillery
	{ '\u0000' }, // artisan
	{ '\u0000' }, // artist
	{ '\u0000' }, // asbestos
	{ '\u0000' }, // ash
	{ '\u0000' }, // asphalt
	{ '\u0000' }, // aspirin
	{ '\u0000' }, // ass - animal
	{ '\u0000' }, // assassin
	{ '\u0000' }, // asterisk
	{ '\u0000' }, // asteroid
	{ '\u0000' }, // asthma
	{ '\u0000' }, // astrology
	{ '\u0000' }, // astronaut
	{ '\u0000' }, // astronomy
	{ '\u0000' }, // asylum
	{ '\u0000' }, // athlete
	{ '\u0000' }, // atlas
	{ '\u0000' }, // atmosphere
	{ '\u0000' }, // atol
	{ '\u0000' }, // atom
	{ '\u0000' }, // attache
	{ '\u0000' }, // attic
	{ '\u0000' }, // attorney
	{ '\u0000' }, // auction
	{ '\u0000' }, // audience
	{ '\u0000' }, // audit
	{ '\u0000' }, // audition
	{ '\u0000' }, // auditorium
	{ '\u0000' }, // aunt
	{ '\u0000' }, // author
	{ '\u0000' }, // authority
	{ '\u0000' }, // autobiography
	{ '\u0000' }, // autocrat
	{ '\u0000' }, // autograph
	{ '\u0000' }, // automobile
	{ '\u0000' }, // autopsy
	{ '\u0000' }, // avalanche
	{ '\u0000' }, // avenue
	{ '\u0000' }, // aviation
	{ '\u0000' }, // avocado
	{ '\u0000' }, // award
	{ '\u0000' }, // axe
	{ '\u0000' }, // axiom
	{ '\u0000' }, // axis
	{ '\u0000' }, // axle

		};

}
/* End: mc_spanish_INTF.java */

/* Filename: mc_english_INTF.java
   javac mc_english_INTF.java
 */
interface mc_english_INTF {

final char[][] w1 = { 
	{ '\u0061', '\u0069', '\u0072' }, // air
	{ '\u0061', '\u0069', '\u0073', 
		'\u006c','\u0065' },      // aisle
	{ '\u0061', '\u006c', '\u0062',
		'\u0061', '\u0074', '\u0072', '\u006f',
		'\u0073' },   // albatros
	{ '\u0061', '\u006c', '\u0062',
		'\u0075', '\u006d' },  // album
	{ '\u0061', '\u006c', '\u0063', '\u006f',  
	  '\u0068', '\u006f', '\u006c' },  // alcohol
	{ '\u0061', '\u006c', '\u0063', 
	  '\u0068', '\u0076', '\u0065' },  // alcove 
	{ '\u0061', '\u006c', '\u0065' },  // ale
	{ '\u0061', '\u006c', '\u0065', '\u0000', 
	  '\u0061', '\u006c', '\u0065' },  // algebra
	{ '\u0000' }, // alien
	{ '\u0000' }, // alimony
	{ '\u0000' }, // alkali
	{ '\u0000' }, // aligator
	{ '\u0000' }, // alloy
	{ '\u0000' }, // almanac
	{ '\u0000' }, // almond
	{ '\u0000' }, // alp
	{ '\u0000' }, // alphabet
	{ '\u0000' }, // altar
	{ '\u0000' }, // aluminum
	{ '\u0000' }, // amazon
	{ '\u0000' }, // ambasador
	{ '\u0000' }, // ambulance
	{ '\u0000' }, // amethyst
	{ '\u0000' }, // ammeter
	{ '\u0000' }, // ammonia
	{ '\u0000' }, // ammunition
	{ '\u0000' }, // amoeba
	{ '\u0000' }, // ampere
	{ '\u0000' }, // amphibian
	{ '\u0000' }, // ampitheatre
	{ '\u0000' }, // anaconda
	{ '\u0000' }, // anaesthesia
	{ '\u0000' }, // angel
	{ '\u0000' }, // angle
	{ '\u0000' }, // animal
	{ '\u0000' }, // ankle
	{ '\u0000' }, // annuity
	{ '\u0000' }, // ant
	{ '\u0000' }, // antarctic
	{ '\u0000' }, // antelope
	{ '\u0000' }, // antenna
	{ '\u0000' }, // anthem
	{ '\u0000' }, // anthology
	{ '\u0000' }, // anthropoid
	{ '\u0000' }, // anthropology
	{ '\u0000' }, // antibiotic
	{ '\u0000' }, // antibody
	{ '\u0000' }, // antifreeze
	{ '\u0000' }, // antique
	{ '\u0000' }, // antiseptic
	{ '\u0000' }, // antler
	{ '\u0000' }, // anus
	{ '\u0000' }, // anvil
	{ '\u0000' }, // aorta
	{ '\u0000' }, // apartment
	{ '\u0000' }, // ape
	{ '\u0000' }, // apertif
	{ '\u0000' }, // aperture
	{ '\u0000' }, // apostle
	{ '\u0000' }, // apostrophe
	{ '\u0000' }, // apparatus
	{ '\u0000' }, // appendicitus
	{ '\u0000' }, // appendix
	{ '\u0000' }, // appetite
	{ '\u0000' }, // appetizer
	{ '\u0000' }, // apple
	{ '\u0000' }, // appliance
	{ '\u0000' }, // applicant
	{ '\u0000' }, // application
	{ '\u0000' }, // apprentice
	{ '\u0000' }, // apricot
	{ '\u0000' }, // apron
	{ '\u0000' }, // aqualung
	{ '\u0000' }, // aquamarine
	{ '\u0000' }, // aquarium
	{ '\u0000' }, // aqueduct
	{ '\u0000' }, // arc
	{ '\u0000' }, // arcade
	{ '\u0000' }, // arch
	{ '\u0000' }, // archaelogy
	{ '\u0000' }, // archangel
	{ '\u0000' }, // archbishop
	{ '\u0000' }, // archer
	{ '\u0000' }, // archipelago
	{ '\u0000' }, // architect
	{ '\u0000' }, // archives
	{ '\u0000' }, // archway
	{ '\u0000' }, // actic
	{ '\u0000' }, // area
	{ '\u0000' }, // arena
	{ '\u0000' }, // aria
	{ '\u0000' }, // aristocracy
	{ '\u0000' }, // aristocrat
	{ '\u0000' }, // arithmetic
	{ '\u0000' }, // ark
	{ '\u0000' }, // arm
	{ '\u0000' }, // armada
	{ '\u0000' }, // armadillo
	{ '\u0000' }, // armour
	{ '\u0000' }, // arms
	{ '\u0000' }, // army
	{ '\u0000' }, // arrow
	{ '\u0000' }, // arsenic
	{ '\u0000' }, // arsenal
	{ '\u0000' }, // arson
	{ '\u0000' }, // art
	{ '\u0000' }, // artefact
	{ '\u0000' }, // artery 
	{ '\u0000' }, // arthritis
	{ '\u0000' }, // artichoke
	{ '\u0000' }, // article
	{ '\u0000' }, // artifact
	{ '\u0000' }, // artillery
	{ '\u0000' }, // artisan
	{ '\u0000' }, // artist
	{ '\u0000' }, // asbestos
	{ '\u0000' }, // ash
	{ '\u0000' }, // asphalt
	{ '\u0000' }, // aspirin
	{ '\u0000' }, // ass - animal
	{ '\u0000' }, // assassin
	{ '\u0000' }, // asterisk
	{ '\u0000' }, // asteroid
	{ '\u0000' }, // asthma
	{ '\u0000' }, // astrology
	{ '\u0000' }, // astronaut
	{ '\u0000' }, // astronomy
	{ '\u0000' }, // asylum
	{ '\u0000' }, // athlete
	{ '\u0000' }, // atlas
	{ '\u0000' }, // atmosphere
	{ '\u0000' }, // atol
	{ '\u0000' }, // atom
	{ '\u0000' }, // attache
	{ '\u0000' }, // attic
	{ '\u0000' }, // attorney
	{ '\u0000' }, // auction
	{ '\u0000' }, // audience
	{ '\u0000' }, // audit
	{ '\u0000' }, // audition
	{ '\u0000' }, // auditorium
	{ '\u0000' }, // aunt
	{ '\u0000' }, // author
	{ '\u0000' }, // authority
	{ '\u0000' }, // autobiography
	{ '\u0000' }, // autocrat
	{ '\u0000' }, // autograph
	{ '\u0000' }, // automobile
	{ '\u0000' }, // autopsy
	{ '\u0000' }, // avalanche
	{ '\u0000' }, // avenue
	{ '\u0000' }, // aviation
	{ '\u0000' }, // avocado
	{ '\u0000' }, // award
	{ '\u0000' }, // axe
	{ '\u0000' }, // axiom
	{ '\u0000' }, // axis
	{ '\u0000' }, // axle

		};

}
/* End: mc_english_INTF.java */


/* Filename: mc_any_INTF.java
	This class is to be copied to other names
        in order to develop other languages.
        It should not be compiled in its present
        state.
 */
interface mc_any_INTF {

final char[][] w1 = { 
	{ '\u0000' }, // air
	{ '\u0000' }, // aisle
	{ '\u0000' }, // albatros
	{ '\u0000' }, // album
	{ '\u0000' }, // alcohol
	{ '\u0000' }, // alcove	
	{ '\u0000' }, // ale
	{ '\u0000' }, // algebra
	{ '\u0000' }, // alien
	{ '\u0000' }, // alimony
	{ '\u0000' }, // alkali
	{ '\u0000' }, // aligator
	{ '\u0000' }, // alloy
	{ '\u0000' }, // almanac
	{ '\u0000' }, // almond
	{ '\u0000' }, // alp
	{ '\u0000' }, // alphabet
	{ '\u0000' }, // altar
	{ '\u0000' }, // aluminum
	{ '\u0000' }, // amazon
	{ '\u0000' }, // ambasador
	{ '\u0000' }, // ambulance
	{ '\u0000' }, // amethyst
	{ '\u0000' }, // ammeter
	{ '\u0000' }, // ammonia
	{ '\u0000' }, // ammunition
	{ '\u0000' }, // amoeba
	{ '\u0000' }, // ampere
	{ '\u0000' }, // amphibian
	{ '\u0000' }, // ampitheatre
	{ '\u0000' }, // anaconda
	{ '\u0000' }, // anaesthesia
	{ '\u0000' }, // angel
	{ '\u0000' }, // angle
	{ '\u0000' }, // animal
	{ '\u0000' }, // ankle
	{ '\u0000' }, // annuity
	{ '\u0000' }, // ant
	{ '\u0000' }, // antarctic
	{ '\u0000' }, // antelope
	{ '\u0000' }, // antenna
	{ '\u0000' }, // anthem
	{ '\u0000' }, // anthology
	{ '\u0000' }, // anthropoid
	{ '\u0000' }, // anthropology
	{ '\u0000' }, // antibiotic
	{ '\u0000' }, // antibody
	{ '\u0000' }, // antifreeze
	{ '\u0000' }, // antique
	{ '\u0000' }, // antiseptic
	{ '\u0000' }, // antler
	{ '\u0000' }, // anus
	{ '\u0000' }, // anvil
	{ '\u0000' }, // aorta
	{ '\u0000' }, // apartment
	{ '\u0000' }, // ape
	{ '\u0000' }, // apertif
	{ '\u0000' }, // aperture
	{ '\u0000' }, // apostle
	{ '\u0000' }, // apostrophe
	{ '\u0000' }, // apparatus
	{ '\u0000' }, // appendicitus
	{ '\u0000' }, // appendix
	{ '\u0000' }, // appetite
	{ '\u0000' }, // appetizer
	{ '\u0000' }, // apple
	{ '\u0000' }, // appliance
	{ '\u0000' }, // applicant
	{ '\u0000' }, // application
	{ '\u0000' }, // apprentice
	{ '\u0000' }, // apricot
	{ '\u0000' }, // apron
	{ '\u0000' }, // aqualung
	{ '\u0000' }, // aquamarine
	{ '\u0000' }, // aquarium
	{ '\u0000' }, // aqueduct
	{ '\u0000' }, // arc
	{ '\u0000' }, // arcade
	{ '\u0000' }, // arch
	{ '\u0000' }, // archaelogy
	{ '\u0000' }, // archangel
	{ '\u0000' }, // archbishop
	{ '\u0000' }, // archer
	{ '\u0000' }, // archipelago
	{ '\u0000' }, // architect
	{ '\u0000' }, // archives
	{ '\u0000' }, // archway
	{ '\u0000' }, // actic
	{ '\u0000' }, // area
	{ '\u0000' }, // arena
	{ '\u0000' }, // aria
	{ '\u0000' }, // aristocracy
	{ '\u0000' }, // aristocrat
	{ '\u0000' }, // arithmetic
	{ '\u0000' }, // ark
	{ '\u0000' }, // arm
	{ '\u0000' }, // armada
	{ '\u0000' }, // armadillo
	{ '\u0000' }, // armour
	{ '\u0000' }, // arms
	{ '\u0000' }, // army
	{ '\u0000' }, // arrow
	{ '\u0000' }, // arsenic
	{ '\u0000' }, // arsenal
	{ '\u0000' }, // arson
	{ '\u0000' }, // art
	{ '\u0000' }, // artefact
	{ '\u0000' }, // artery 
	{ '\u0000' }, // arthritis
	{ '\u0000' }, // artichoke
	{ '\u0000' }, // article
	{ '\u0000' }, // artifact
	{ '\u0000' }, // artillery
	{ '\u0000' }, // artisan
	{ '\u0000' }, // artist
	{ '\u0000' }, // asbestos
	{ '\u0000' }, // ash
	{ '\u0000' }, // asphalt
	{ '\u0000' }, // aspirin
	{ '\u0000' }, // ass - animal
	{ '\u0000' }, // assassin
	{ '\u0000' }, // asterisk
	{ '\u0000' }, // asteroid
	{ '\u0000' }, // asthma
	{ '\u0000' }, // astrology
	{ '\u0000' }, // astronaut
	{ '\u0000' }, // astronomy
	{ '\u0000' }, // asylum
	{ '\u0000' }, // athlete
	{ '\u0000' }, // atlas
	{ '\u0000' }, // atmosphere
	{ '\u0000' }, // atol
	{ '\u0000' }, // atom
	{ '\u0000' }, // attache
	{ '\u0000' }, // attic
	{ '\u0000' }, // attorney
	{ '\u0000' }, // auction
	{ '\u0000' }, // audience
	{ '\u0000' }, // audit
	{ '\u0000' }, // audition
	{ '\u0000' }, // auditorium
	{ '\u0000' }, // aunt
	{ '\u0000' }, // author
	{ '\u0000' }, // authority
	{ '\u0000' }, // autobiography
	{ '\u0000' }, // autocrat
	{ '\u0000' }, // autograph
	{ '\u0000' }, // automobile
	{ '\u0000' }, // autopsy
	{ '\u0000' }, // avalanche
	{ '\u0000' }, // avenue
	{ '\u0000' }, // aviation
	{ '\u0000' }, // avocado
	{ '\u0000' }, // award
	{ '\u0000' }, // axe
	{ '\u0000' }, // axiom
	{ '\u0000' }, // axis
	{ '\u0000' }, // axle

		};

}
/* End: mc_spanish_INTF.java */


/* Filename: Host_Languages.java
   javac Host_Languages.java
*/
interface Host_Languages {
	final byte English    = 0;
	final byte French     = 1;
	final byte Spanish    = 2;
	final byte German     = 3;
	final byte Chinese    = 4;
	final byte Vietnamese = 5;
	final byte Japanese   = 6;
	final byte Hindi      = 7;
	final byte Russian    = 8;
	final byte Cherokee   = 9;
	final byte Arabic     = 10;
	final byte Greek      = 11;
	final byte Hebrew     = 12;
	final byte Bengali    = 13;
	final byte Canadian_aboringinal = 14;
	final byte Korean     = 15;
	final byte Ethiopic   = 16;
	final byte Gujarati   = 17;
	final byte Gurmukhi   = 18;
	final byte Cyrillic   = 19;
	final byte Mongolian  = 20;
	final byte Romanian   = 21;
	final byte Serbian    = 22;
	final byte Georgian   = 23;
	final byte Thai       = 24;
	final byte Tibetan    = 25;
	final byte Ogonek     = 26;
	// ...
}

/* End: Host_Languages.java */

Expires 10/22/2001