Internet DRAFT - draft-burnett-mrcpext

draft-burnett-mrcpext




     Internet Engineering Task Force                               D. Burnett 
     Internet-Draft                                     Nuance Communications 
     draft-burnett-mrcpext-01                                      P. Forgues 
     Expires: June 24, 2004                             Nuance Communications 
                                                                    C. Galles 
                                                             Intervoice, Inc. 
                                                            December 24, 2003 
      
      
      
                MRCP Extensions: Media Resource Control Protocol Extensions 
      
     Status of this Memo  
      
        This document is an Internet-Draft and is subject to all provisions 
        of Section 10 of RFC2026. 
         
        Internet-Drafts are working documents of the Internet Engineering 
        Task Force (IETF), its areas, and its working groups.  Note that 
        other groups may also distribute working documents as Internet-
        Drafts. 
         
        Internet-Drafts are draft documents valid for a maximum of six      
        months and may be updated, replaced, or obsoleted by other documents 
        at any time.  It is inappropriate to use Internet-Drafts as 
        reference material or to cite them other than as "work in progress." 
         
        The list of current Internet-Drafts can be accessed at 
        http://www.ietf.org/1id-abstracts.html 
         
        The list of Internet-Draft Shadow Directories can be accessed at 
        http://www.ietf.org/shadow.html 
         
     Copyright Notice 
             
        Copyright (C) The Internet Society (2003).  All Rights Reserved. 
      
            
     Abstract 
       
        The Media Resource Control Protocol (MRCP) is an application level 
        protocol to control media service resources like Speech 
        Synthesizers, Recognizers, Signal Generators, Signal Detectors, Fax 
        Servers etc. over a network.  This document captures the extensions 
        required to implement Voice Enrollment, Speaker Verification and 
        Hotword recognition as well as to augment the recognizer 
        functionality using MRCP.  The extensions are largely orthogonal to 
        existing features of MRCP and to each other, with an eye towards 
        backwards compatibility with existing features and independence of 
        the extensions from each other to simplify integration. 
         
        This document is published as an Internet-Draft as input for further 
        IETF development in this area. 
         
      
                                                                     Page 1 
                                MRCP Extensions               October 2003 

         





















































      
     Burnett, et al.               IETF-Draft                        Page 2 
                                MRCP Extensions               October 2003 

      Table of Contents 
         
      Status of this Memo.................................................1 
      Abstract............................................................1 
      1.  Introduction....................................................6 
      2.  Architecture....................................................7 
      3.  Notational Conventions..........................................7 
      4.  Recognizer resource extensions..................................8 
      4.1.  Recognizer Resource Extensions Methods........................8 
      4.2.  Recognizer Resource Extensions Events.........................8 
      4.3.  Recognizer Resource Extensions Header Fields..................8 
        4.3.1.  Interpret-Text............................................8 
        4.3.2.  Enroll-Utterance..........................................9 
        4.3.3.  Ver-Buffer-Utterance......................................9 
      4.4.  RECORD........................................................9 
      4.5.  Record Header Fields..........................................9 
        4.5.1.  Recording-URL............................................10 
        4.5.2.  Ver-Buffer-Utterance.....................................10 
      4.6.  INTERPRET....................................................10 
      4.7.  RECORDING-COMPLETE...........................................12 
      4.8.  INTERPRETATION-COMPLETE......................................12 
      5.  Enrollment.....................................................14 
      5.1.  Enrollment State Machine.....................................14 
      5.2.  Enrollment Methods...........................................14 
      5.3.  Enrollment Events............................................15 
      5.4.  Enrollment Header Fields.....................................15 
        5.4.1.  Num-Min-Consistent-Pronunciations........................16 
        5.4.2.  Consistency-Threshold....................................16 
        5.4.3.  Clash-Threshold..........................................16 
        5.4.4.  Personal-Grammar-URI.....................................17 
        5.4.5.  Phrase-Id................................................17 
        5.4.6.  Phrase-NL................................................17 
        5.4.7.  Weight...................................................17 
        5.4.8.  Save-Best-Waveform.......................................17 
        5.4.9.  Waveform-URL.............................................18 
        5.4.10.   New-Phrase-Id..........................................18 
        5.4.11.   Confusable-Phrases-URI.................................18 
        5.4.12.   Abort-Phrase-Enrollment................................18 
        5.4.13.   Completion-Cause.......................................18 
      5.5.  Enrollment Result Elements...................................19 
        5.5.1.  Num-Clashes..............................................19 
        5.5.2.  Num-Good-Repetitions.....................................19 
        5.5.3.  Num-Repetitions-Still-Needed.............................19 
        5.5.4.  Consistency-Status.......................................20 
        5.5.5.  Clash-Phrase-Ids.........................................20 
        5.5.6.  Transcriptions...........................................20 
        5.5.7.  Confusable-Phrases.......................................20 
      5.6.  Enrollment Methods...........................................21 
        5.6.1.  START-PHRASE-ENROLLMENT..................................21 
        5.6.2.  RECOGNIZE................................................22 
        5.6.3.  STOP.....................................................22 
        5.6.4.  ENROLLMENT-ROLLBACK......................................23 
        5.6.5.  END-PHRASE-ENROLLMENT....................................23 
        5.6.6.  MODIFY-PHRASE............................................24 
      
     Burnett, et al.               IETF-Draft                        Page 3 
                                MRCP Extensions               October 2003 

        5.6.7.  DELETE-PHRASE............................................24 
        5.6.8.  RECOGNITION-COMPLETE.....................................24 
      6.  Speaker Verification and Identification........................26 
      6.1.  Speaker Verification/Identification Resource.................26 
      6.2.  SETUP Verification/Identification Resource...................27 
      6.3.  Speaker Verification State Machine...........................27 
      6.4.  Speaker Verification Methods.................................27 
      6.5.  Verification Events..........................................27 
      6.6.  Verification Header Fields...................................28 
        6.6.1.  Voiceprint-URI...........................................29 
        6.6.2.  Voiceprint-Identifier....................................29 
        6.6.3.  Voiceprint-Group.........................................29 
        6.6.4.  Verification-Mode........................................30 
        6.6.5.  Adapt-Model..............................................31 
        6.6.6.  Abort-Model..............................................31 
        6.6.7.  Security-Level...........................................31 
        6.6.8.  Num-Min-Verification-Phrases.............................32 
        6.6.9.  Num-Max-Verification-Phrases.............................32 
        6.6.10.   No-Input-Timeout.......................................32 
        6.6.11.   Save-Waveform..........................................32 
        6.6.12.   Waveform-URL...........................................33 
        6.6.13.   Vendor-Specific........................................33 
        6.6.14.   Voiceprint-Exists......................................33 
        6.6.15.   Ver-Buffer-Utterance...................................34 
        6.6.16.   Input-Waveform-Url.....................................34 
        6.6.17.   Verification-Type......................................34 
        6.6.18.   Digit-Sequence.........................................34 
        6.6.19.   Completion-Cause.......................................34 
      6.7.  Verification Result Elements.................................35 
        6.7.1.  Decision.................................................36 
        6.7.2.  Num-Frames...............................................36 
        6.7.3.  Device...................................................36 
        6.7.4.  Gender...................................................36 
        6.7.5.  Matched..................................................36 
        6.7.6.  Adapted..................................................37 
        6.7.7.  Verification-Score.......................................37 
        6.7.8.  Group-Name...............................................37 
        6.7.9.  Member...................................................37 
        6.7.10.   Score..................................................37 
        6.7.11.   Vendor-Specific-Results................................38 
      6.8.  Verification Session Methods.................................38 
        6.8.1.  VER-START-SESSION........................................39 
        6.8.2.  VER-END-SESSION..........................................40 
        6.8.3.  VER-SET-VOICEPRINT.......................................40 
        6.8.4.  VER-DELETE-VOICEPRINT....................................42 
        6.8.5.  VERIFY...................................................43 
        6.8.6.  VER-FROM-BUFFER..........................................43 
        6.8.7.  VER-ROLLBACK.............................................46 
        6.8.8.  VER-STOP.................................................46 
        6.8.9.  VER-START-TIMERS.........................................47 
        6.8.10.   SET-PARAMS.............................................47 
        6.8.11.   GET-PARAMS.............................................47 
      6.9.  Verification Session Events..................................48 
        6.9.1.  VERIFICATION-COMPLETE....................................48 
      
     Burnett, et al.               IETF-Draft                        Page 4 
                                MRCP Extensions               October 2003 

        6.9.2.  START-OF-SPEECH..........................................48 
      7.  Hotword Recognition............................................50 
      7.1.  Hotword State Machine........................................50 
        7.1.1.  Addressing Resources.....................................50 
      7.2.  Hotword Header Fields........................................51 
        7.2.1.  Hotword-Max-Duration.....................................51 
        7.2.2.  Hotword-Min-Duration.....................................51 
      7.3.  Hotword Methods..............................................51 
        7.3.1.  SETUP....................................................51 
        7.3.2.  RECOGNIZE................................................52 
      8.  RTSP based Examples:...........................................54 
      8.1.  Enrollment...................................................54 
      8.2.  Speaker Verification and Identification......................56 
      8.3.  Hotword Recognition..........................................62 
      9.  Security Considerations........................................62 
      10. Reference Documents............................................62 
      Acknowledgements...................................................62 
      Full Copyright Statement...........................................63 
      Authors’ Addresses.................................................63 
         
      

































      
     Burnett, et al.               IETF-Draft                        Page 5 
                                MRCP Extensions               October 2003 

     1.   Introduction 
         
        The Media Resource Control Protocol (MRCP) [3] is an application 
        level protocol to control media service resources like Speech 
        Synthesizers, Recognizers, Signal Generators, Signal Detectors, Fax 
        Servers etc. over a network. This protocol is designed to work with 
        streaming protocols like RTSP (Real Time Streaming Protocol) or SIP 
        (Session Initiation Protocol) which help establish control 
        connections to external media streaming devices, and media delivery 
        mechanisms like RTP (Real Time Protocol). MRCP supports basic 
        recognition and speech synthesis (TTS) capabilities. 
         
        This document captures the extensions required to implement Voice 
        Enrollment, Speaker Verification and Hotword recognition as well as 
        to augment the recognizer functionality using MRCP.  Already having 
        functional implementations of [3], the authors developed these 
        extensions within that framework.  It is expected that these methods 
        will also prove useful as information for the IETF in its 
        standardization efforts beyond this draft version of MRCP. 
          
        A major goal of the Recognition, Enrollment, Speaker Verification 
        and Hotword recognition extensions is to be backward compatible, 
        i.e. to implement them in such a way that previous functionality is 
        available without change.  In addition, the MRCP extensions used for 
        Enrollment, Speaker Verification and Identification and Hotword 
        recognition are independent from one another.  This means a client 
        can implement only the set of methods needed for a particular 
        integration.  For example, only the Enrollment methods and responses 
        need to be implemented by a client, provided the server has 
        implemented those methods. 
         
        The extensions for Enrollment do not need a separate resource type 
        because they are implemented as part of the recognition resource.  
        Speaker Verification and Hotword recognition were defined as new 
        resource types since they essentially consist in either creating a 
        verification resource or attaching a special kind of Recognizer 
        resource on the session in addition to the primary Recognizer 
        resource (unlike Enrollment). 
         
        There is no need to change the underlying protocols to support 
        Enrollment, Speaker Verification or Hotword recognition.  Like the 
        original MRCP specification, the extensions rely on a protocol like 
        the Real Time Streaming Protocol (RTSP) or Session Initiation 
        Protocol (SIP) to establish and maintain the session. The session 
        control protocol is also responsible for establishing the media 
        connection from the client to the network server.  
         
        The MRCP protocol extensions define the requests, responses and 
        events needed to control Voice Enrollment, Speaker Verification and 
        Hotword recognition features. It is assumed the state machine for a 
        recognition resource is preserved.  
         


      
     Burnett, et al.               IETF-Draft                        Page 6 
                                MRCP Extensions               October 2003 

        The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY" and "OPTIONAL" in this 
        document are to be interpreted as described in RFC 2119[5].  
         
        Please send any feedback on this document directly to the authors. 
         
     2.   Architecture 
         
        There is no change in architecture from the original MRCP 
        specification.  It is assumed that Enrollment is done by a 
        Recognizer resource.  Therefore, an appropriate SETUP message needs 
        to be sent and a media stream established between a client and 
        server before these functions are used.  
         
        Speaker Verification and Hotword recognition are slightly different.  
        For Speaker verification, a new verification resource is now 
        defined.  This verification resource can be used on its own or be 
        attached to a session where a recognition is already set up. 
         
        For Hotword recognition it differs in that a second Recognizer 
        resource needs to be attached to the same session.  The state 
        machine for this second recognizer is the same as for the primary 
        Recognizer resource. 
         
        The following sections describe each of the following MRCP 
        extensions separately: (1) Recognizer resource extensions, (2) 
        Enrollment, (3) Speaker Verification and Identification and (4) 
        Hotword recognition.  
         
          
     3.   Notational Conventions 
         
        Most of the definitions and syntax follow the same format used in 
        the MRCP draft submission.  The only new field required is to 
        represent short floating-point numbers needed to indicate relative 
        weight for some of the header fields.  A weight is normalized in the 
        range of 0 to 1. 
         
          WEIGHT    = ( "0" [ "." 0*3DIGIT ] ) | ( "1" [ "." 0*3("0") ] ) 
         
          FLOAT     = [ "+" / "-" ] 1*DIGIT [ "." 0*DIGIT ]













      
     Burnett, et al.               IETF-Draft                        Page 7 
                                MRCP Extensions               October 2003 

         
      
     4.   Recognizer resource extensions 
      
        The only new functionality added to the recognizer resource is the 
        inclusion of the INTERPRET and RECORD methods and the associated 
        INTERPRETATION-COMPLETE and RECORDING-COMPLETE events. 
         
         
     4.1. Recognizer Resource Extensions Methods 
         
        The following methods are supported by the recognizer resource in 
        addition to those already defined in [3]. 
         
          recognizer-extension-method   = "RECORD" 
                                        | "INTERPRET" 
         
      
     4.2. Recognizer Resource Extensions Events 
         
        The recognizer resource may now generate the following events in 
        addition to those already defined in [3]. 
         
          recognizer-extension-event    = "RECORDING-COMPLETE" 
                                        | "INTERPRETATION-COMPLETE" 
         
         
     4.3. Recognizer Resource Extensions Header Fields 
         
        The recognizer resource extensions define new header fields to 
        augment the request, response or event messages they are associated 
        with. 
          recognizer-extension-header   = "Interpret-Text"; Section 4.3.1 
                                        | "Enroll-Utterance"; Section 4.3.2 
                                        | "Ver-Buffer-Utterance"; Sec. 4.3.3 
         
         
          Parameter           Support        Methods/Events/Responses 
         
         
          interpret-text      MANDATORY      INTERPRET 
          enroll-utterance    OPTIONAL       RECOGNIZE 
          ver-buffer-utterance     OPTIONAL  RECOGNIZE, VERIFY, RECORD 
         
         
     4.3.1. Interpret-Text 
         
        This header field is used to provide the text string for which a 
        natural language interpretation is desired.  This header field MUST 
        be used when invoking the INTERPRET method as it cannot be set with 
        the SET-PARAMS method. 
         
          interpret-text = "Interpret-Text" : 1*OCTET CRLF 
         
      
     Burnett, et al.               IETF-Draft                        Page 8 
                                MRCP Extensions               October 2003 

     4.3.2. Enroll-Utterance  
          
        This header field is used to indicate to the Recognizer resource to 
        consider this utterance for training a phrase for Voice Enrollment.   
        If this flag is specified then this utterance will be considered  
        when doing proximity testing between repetitions of the same phrase  
        and for doing clash testing with other phrases in the grammar. This 
        header field is OPTIONAL in the RECOGNIZE method. The default value 
        for this field is false. 
          
          enroll-utterance = "Enroll-Utterance" : Boolean-value CRLF  
          
     4.3.3. Ver-Buffer-Utterance  
          
        This header field is used to indicate that this utterance should be 
        considered for Speaker Verification.  This way, an application can 
        buffer utterances while doing regular recognition or verification 
        activities and speaker verification can later be requested on the 
        buffered utterances.  This header field is OPTIONAL in the 
        RECOGNIZE, VERIFY or RECORD method. The default value for this field 
        is false. 
          
          ver-buffer-utterance = "Ver-Buffer-Utterance" : Boolean-value CRLF  
         
         
     4.4. RECORD 
         
        The RECORD method does not invoke the recognizer resource but simply 
        endpoints and records the input audio stream.  It saves the 
        endpointed audio to a URL having its name supplied in the recording-
        url header field.  Currently, this URL can only use the ’file’ 
        scheme. 
         
        If a RECOGNIZE, INTERPRET or another RECORD operation is already in 
        progress, invoking this method will cause the response to have a 
        status code of 402, "Method not valid in this state", and a COMPLETE 
        request state. 
         
        It the recording-url is not valid, a status code of 404, "Illegal 
        Value for Parameter", will be returned in the response.  If it is 
        impossible for the server to create the requested file, a status 
        code of 407, "Method or Operation Failed", will be returned. 
         
        If the recording-url is valid, the recording operation is initiated 
        and the response will indicate an IN-PROGRESS request state.  The 
        server MAY generate a subsequent START-OF-SPEECH event when speech 
        is detected.  Upon completion of the recording operation, the server 
        will generate a RECORDING-COMPLETE event.  
         
     4.5. Record Header Fields 
         
        A Record request may contain header fields containing request 
        options and information to augment the Request, Response or Event 
        message it is associated with. 
      
     Burnett, et al.               IETF-Draft                        Page 9 
                                MRCP Extensions               October 2003 

         
          record-header = 
                       recording-url;        Section 4.5.1 
                       | ver-buffer-utterance; Section 4.5.2 
         
          Parameter            Support   Methods/Events  
         
          recording-url        MANDATORY RECORD, SET-PARAMS, GET-PARAMS 
          Ver-buffer-utterance OPTIONAL  RECOGNIZE, VERIFY, RECORD 
         
     4.5.1. Recording-URL 
         
        This header field specifies the location where the audio stream 
        recorded by a call to the RECORD method should be saved.  Currently, 
        this should only be a URL using the ’file’ scheme.  Should this URL 
        be relative, it will be treated relative to the current working 
        directory where the MRCP server process is running. 
         
        This header field MAY be used only when invoking the RECORD, SET-
        PARAMS and GET-PARAMS method. 
         
          recording-url = "Recording-URL" ":" Url CRLF 
      
         
        Example:  
         
              C->S:RECORD 456234 MRCP/1.0  
                   Recording-URL: file://mediaserver/recordings/myfile.wav   
                    
              S->C:MRCP/1.0 456234 200 IN-PROGRESS  
              
              S->C:START-OF-SPEECH 456234 IN-PROGRESS MRCP/1.0  
                     
              S->C:RECORDING-COMPLETE 456234 COMPLETE MRCP/1.0  
                   Completion-Cause: 000 success 
         
     4.5.2. Ver-Buffer-Utterance  
          
        This header field is used to indicate that this utterance should be 
        considered for Speaker Verification.  This way, an application can 
        buffer utterances while doing regular recognition or verification 
        activities and speaker verification can later be requested on the 
        buffered utterances.  This header field is OPTIONAL in the 
        RECOGNIZE, VERIFY or RECORD method.  
          
          ver-buffer-utterance = "Ver-Buffer-Utterance" : Boolean-value CRLF  
         
     4.6. INTERPRET 
         
        The INTERPRET method from the client to the server takes as input an 
        interpret-text header, containing the text for which the semantic 
        interpretation is desired, and returns, via the INTERPRETATION-
        COMPLETE event, an interpretation result which is very similar to 
        the one returned from a RECOGNIZE method invocation.  Only portions 
      
     Burnett, et al.               IETF-Draft                       Page 10 
                                MRCP Extensions               October 2003 

        of the result relevant to acoustic matching are excluded from the 
        result.  The interpret-text header MUST be included in the INTERPRET 
        request. 
         
        Recognizer grammar data is treated in the same way as it is when 
        issuing a RECOGNIZE method call. 
         
        If a RECOGNIZE, RECORD or another INTERPRET operation is already in 
        progress, invoking this method will cause the response to have a 
        status code of 402, "Method not valid in this state", and a COMPLETE 
        request state. 
         
        Example:  
         
              C->S:INTERPRET 234567 MRCP/1.0  
                   Interpret-Text: may I speak to Andre Roy 
                   Content-Type: application/grammar+xml  
                   Content-Id: request1@form-level.store  
                   Content-Length: 104  
                     
                   <?xml version="1.0"?>  
                     
                   <!-- the default grammar language is US English -->  
                   <grammar xml:lang="en-US" version="1.0">  
                   <!-- single language attachment to tokens -->  
                   <rule id="yes">  
                            <one-of>  
                                     <item xml:lang="fr-CA">oui</item>  
                                     <item xml:lang="en-US">yes</item>  
                            </one-of>   
                        </rule>   
                     
                   <!-- single language attachment to a rule expansion -->  
                        <rule id="request">  
                            may I speak to  
                            <one-of xml:lang="fr-CA">  
                                     <item>Michel Tremblay</item>  
                                     <item>Andre Roy</item>  
                            </one-of>  
                        </rule>  
                     
                     </grammar>  
              
              S->C:MRCP/1.0 234567 200 IN-PROGRESS  
              
                   
              S->C:INTERPRETATION-COMPLETE 234567 COMPLETE MRCP/1.0  
                   Completion-Cause: 000 success  
                   Content-Type: application/x-nlsml  
                   Content-Length: 276  
                     
                   <?xml version="1.0"?>  
                   <result x-model="http://IdentityModel"  
                     xmlns:xf="http://www.w3.org/2000/xforms"  
      
     Burnett, et al.               IETF-Draft                       Page 11 
                                MRCP Extensions               October 2003 

                     grammar="session:request1@form-level.store"  
                       <interpretation>  
                           <xf:instance name="Person">  
                               <Person>  
                                   <Name> Andre Roy </Name>  
                               </Person>  
                           </xf:instance>  
                             <input>   may I speak to Andre Roy </input>  
                       </interpretation>  
                   </result> 
         
     4.7. RECORDING-COMPLETE 
         
        This event from the recognition resource to the client indicates 
        that the RECORD operation is complete.  The request state MUST be 
        set to COMPLETE. 
         
        The completion-cause header MUST be included in this event.  It MUST 
        be set to one of the following values defined for the recognizer 
        resource: 
         
          Cause-Code     Cause-Name          Description 
         
          000            success             RECORD completed successfully 
          002            no-input-timeout    RECORD completed with no audio 
                                             recorded due to lack of input 
          006            error               RECORD operation terminated 
                                             due to an error 
         
        When the completion-cause is "000 success", the URL specified via 
        the recording-url header in the RECORD method invocation will 
        contain the recorded audio.  The client may then use this URL to 
        retrieve the audio. 
         
        Example:  
         
              C->S:RECORD 456234 MRCP/1.0  
                   Recording-URL: file://mediaserver/recordings/myfile.wav   
                    
              S->C:MRCP/1.0 456234 200 IN-PROGRESS  
              
              S->C:START-OF-SPEECH 456234 IN-PROGRESS MRCP/1.0  
                     
              S->C:RECORDING-COMPLETE 456234 COMPLETE MRCP/1.0  
                   Completion-Cause: 000 success 
         
         
     4.8. INTERPRETATION-COMPLETE 
         
        This event from the recognition resource to the client indicates 
        that the INTERPRET operation is complete.  The interpretation result 
        is sent in the body of the MRCP message.  The request state MUST be 
        set to COMPLETE. 
         
      
     Burnett, et al.               IETF-Draft                       Page 12 
                                MRCP Extensions               October 2003 

        The completion-cause header MUST be included in this event and MUST 
        be set to one of the following two values defined for the recognizer 
        resource: 
         
          Cause-Code     Cause-Name          Description 
         
          000            success             INTERPRET completed  
                                             successfully 
          006            error               INTERPRET terminated 
                                             due to an error 
         
        Example: 
              C->S:INTERPRET 234567 MRCP/1.0  
                   Interpret-Text: may I speak to Andre Roy 
                   Content-Type: application/grammar+xml  
                   Content-Id: request1@form-level.store  
                   Content-Length: 104  
                     
                   <?xml version="1.0"?>  
                     
                   <!-- the default grammar language is US English -->  
                   <grammar xml:lang="en-US" version="1.0">  
                   <!-- single language attachment to tokens -->  
                   <rule id="yes">  
                            <one-of>  
                                     <item xml:lang="fr-CA">oui</item>  
                                     <item xml:lang="en-US">yes</item>  
                            </one-of>   
                        </rule>   
                     
                   <!-- single language attachment to a rule expansion -->  
                        <rule id="request">  
                            may I speak to  
                            <one-of xml:lang="fr-CA">  
                                     <item>Michel Tremblay</item>  
                                     <item>Andre Roy</item>  
                            </one-of>  
                        </rule>  
                     
                     </grammar>  
              
              S->C:MRCP/1.0 234567 200 IN-PROGRESS  
              
                   
              S->C:INTERPRETATION-COMPLETE 234567 COMPLETE MRCP/1.0  
                   Completion-Cause: 000 success  
                   Content-Type: application/x-nlsml  
                   Content-Length: 276  
                     
                   <?xml version="1.0"?>  
                   <result x-model="http://IdentityModel"  
                     xmlns:xf="http://www.w3.org/2000/xforms"  
                     grammar="session:request1@form-level.store"  
                       <interpretation>  
      
     Burnett, et al.               IETF-Draft                       Page 13 
                                MRCP Extensions               October 2003 

                           <xf:instance name="Person">  
                               <Person>  
                                   <Name> Andre Roy </Name>  
                               </Person>  
                           </xf:instance>  
                             <input>   may I speak to Andre Roy </input>  
                       </interpretation>  
                   </result> 
         
         
         
     5.   Enrollment 
         
        This document captures the extensions required to implement Voice 
        Enrollment, Speaker Verification and Hotword recognition using MRCP.  
        This section describes the methods, responses and events needed for 
        doing Enrollment. 
         
        Enrollment is performed using a person’s voice.  For example, a list 
        of contacts can be created and maintained by recording the person’s 
        names using the caller’s voice.  This technique is sometimes also 
        called speaker-dependant recognition.     
         
        Voice Enrollment has a concept of an enrollment session.  A session 
        to add a new phrase to a personal grammar involves the initial 
        enrollment followed by a repeat of enough utterances before 
        committing the new phrase to the personal grammar.  Each time an 
        utterance is recorded, it is compared for similarity with the other 
        samples and a clash test is performed against other entries in the 
        personal grammar to ensure there are no similar and confusable 
        entries. 
         
        Most vendors perform the enrollment feature using a Recognizer 
        resource.  The way to control which utterances are to be considered 
        for enrollment of a new phrase is achieved by setting a header field 
        in the Recognize request, rather than pausing or resuming the phrase 
        enrollment session.  This mechanism is explained in more detail in 
        following sections. 
         
     5.1. Enrollment State Machine  
         
        Starting an enrollment session does not change the state of the 
        recognizer resource, i.e. it remains idle.  Once an enrollment 
        session is started, then utterances are enrolled by calling the 
        RECOGNIZE method repeatedly.  The state of the Speech Recognizer 
        resources goes from IDLE to RECOGNIZING state each time RECOGNIZE is 
        called. 
         
         
     5.2. Enrollment Methods 
         
        Enrollment supports the following methods. 
          enrollment-method  =  "START-PHRASE-ENROLLMENT"  
                              | "RECOGNIZE" 
      
     Burnett, et al.               IETF-Draft                       Page 14 
                                MRCP Extensions               October 2003 

                              | "STOP" 
                              | "ENROLLMENT-ROLLBACK" 
                              | "END-PHRASE-ENROLLMENT" 
                              | "MODIFY-PHRASE" 
                              | "DELETE-PHRASE" 
         
           
     5.3. Enrollment Events 
         
        Enrollment may generate the following events. 
          enrollment-event   =  "RECOGNITION-COMPLETE" 
         
         
     5.4. Enrollment Header Fields 
         
        An Enrollment request may contain header fields containing request 
        options and information to augment the Request, Response or Event 
        message it is associated with. 
         
          enrollment-header  = 
                       num-min-consistent-pronunciations    ; Section 5.4.1 
                              | consistency-threshold       ; Section 5.4.2 
                              | clash-threshold             ; Section 5.4.3 
                              | personal-grammar-uri        ; Section 5.4.4 
                              | phrase-id                   ; Section 5.4.5 
                              | phrase-nl                   ; Section 5.4.6 
                              | weight                      ; Section 5.4.7 
                              | save-best-waveform          ; Section 5.4.8 
                              | waveform-url                ; Section 5.4.9 
                              | new-phrase-id               ; Section 5.4.10 
                              | confusable-phrases-uri      ; Section 5.4.11 
                              | abort-phrase-enrollment     ; Section 5.4.12 
                              | completion-cause            ; Section 5.4.13  
         
          Parameter            Support   Methods/Events  
              
        num-min-consistent     MANDATORY START-PHRASE-ENROLLMENT,  
          -pronunciations                SET-PARAMS, GET-PARAMS 
        consistency-threshold  OPTIONAL  START-PHRASE-ENROLLMENT,  
                                         SET-PARAMS, GET-PARAMS 
        clash-threshold        OPTIONAL  START-PHRASE-ENROLLMENT,  
                                         SET-PARAMS, GET-PARAMS 
        personal-grammar-uri   MANDATORY START-PHRASE-ENROLLMENT,  
                                         SET-PARAMS, GET-PARAMS, 
                                         MODIFY-PHRASE, DELETE-PHRASE 
        phrase-id              MANDATORY MODIFY-PHRASE, DELETE-PHRASE, 
                                         START-PHRASE-ENROLLMENT  
        phrase-nl              MANDATORY MODIFY-PHRASE,  
                                         START-PHRASE-ENROLLMENT  
        weight                 OPTIONAL  MODIFY-PHRASE, 
                                         START-PHRASE-ENROLLMENT  
        save-best-waveform     OPTIONAL SET-PARAMS, GET-PARAMS, RECOGNIZE 
        waveform-url           MANDATORY RECOGNITION-COMPLETE 
        new-phrase-id          OPTIONAL  MODIFY-PHRASE 
      
     Burnett, et al.               IETF-Draft                       Page 15 
                                MRCP Extensions               October 2003 

        confusable-phrases-uri OPTIONAL  RECOGNIZE    
        abort-phrase-enrollment OPTIONAL END-PHRASE-ENROLLMENT                  
        completion-cause       MANDATORY RECOGNITION-COMPLETE 
         
        For enrollment-specific header fields that can appear as part of 
        SET-PARAMS or GET-PARAMS methods, the following general rule 
        applies:  the START-PHRASE-ENROLLMENT method must be called before 
        these header fields can be set through the SET-PARAMS method or 
        retrieved through the GET-PARAMS method.  
         
         
     5.4.1. Num-Min-Consistent-Pronunciations  
         
        This parameter MAY BE specified in a START-PHRASE-ENROLLMENT, SET-
        PARAMS, or GET-PARAMS method and is used to specify the minimum 
        number of consistent pronunciations that must be obtained to voice 
        enroll a new phrase. The minimum value is 1. The default value is 
        platform specific and MAY BE greater than 1. 
      
          num-min-consistent-pronunciations  =  
                       "Num-Min-Consistent-Pronunciations" ":" 1*DIGIT CRLF  
         
         
     5.4.2. Consistency-Threshold  
         
        This parameter MAY BE sent as part of the START-PHRASE-ENROLLMENT, 
        SET-PARAMS, or GET-PARAMS method.  Used during voice-enrollment, 
        this parameter specifies how similar an utterance needs to be to a 
        previously enrolled pronunciation of the same phrase to be 
        considered "consistent." The higher the threshold, the closer the 
        match between an utterance and previous pronunciations must be for 
        the pronunciation to be considered consistent. The range for this 
        threshold is 0 to 100. 
         
          consistency-threshold = "Consistency-Threshold" ":" 1*DIGIT CRLF 
          
         
     5.4.3. Clash-Threshold 
         
        This parameter MAY BE sent as part of the START-PHRASE-ENROLLMENT, 
        SET-PARMS, or GET-PARAMS method.  Used during voice-enrollment, this 
        parameter specifies how similar the pronunciations of two different 
        phrases can be before they are considered to be clashing. For 
        example, pronunciations of phrases such as "John Smith" and "Jon 
        Smits" may be so similar that they are difficult to distinguish 
        correctly. A smaller threshold reduces the number of clashes 
        detected. The range for this threshold is 0 to 100. The default 
        value for this field is platform specific. 
         
          clash-threshold     =    "Clash-Threshold" ":" 1*DIGIT CRLF 
         
         
         

      
     Burnett, et al.               IETF-Draft                       Page 16 
                                MRCP Extensions               October 2003 

     5.4.4. Personal-Grammar-URI  
         
        This parameter specifies the speaker-trained grammar to be used or 
        referenced during enrollment operations.  For example, a contact 
        list for user "Jeff" could be stored at the Personal-Grammar-
        URI="http://myserver/myenrollmentdb/jeff-list". There is no default 
        value for this header field. 
         
          personal-grammar-uri = "Personal-Grammar-URI" ":" Url CRLF 
      
         
     5.4.5. Phrase-Id 
         
        This header identifies a phrase in a personal grammar and will also 
        be returned when doing recognition.  This header field MAY occur in 
        START-PHRASE-ENROLLMENT, MODIFY-PHRASE or DELETE-PHRASE requests. 
        There is no default value for this header field. 
         
          phrase-id           =    "Phrase-ID" ":" 1*ALPHA CRLF 
      
     5.4.6. Phrase-NL 
         
        This is a string specifying the natural language statement to 
        execute when the phrase is recognized.  This header field MAY occur 
        in START-PHRASE-ENROLLMENT and MODIFY-PHRASE requests. There is no 
        default value for this header field. 
         
          phrase-nl           =    "Phrase-NL" ":" 1*ALPHA CRLF 
         
     5.4.7. Weight  
         
        The value of this header field represents the occurrence likelihood 
        of this branch of the grammar.  The weights are normalized to sum to 
        one at compilation time, so use the value of ’1’ if you want all 
        branches to have the same weight. This header field MAY occur in 
        START-PHRASE-ENROLLMENT and MODIFY-PHRASE requests.  
         
          weight         = "Weight" ":" WEIGHT CRLF 
      
         
     5.4.8. Save-Best-Waveform  
         
        This header field allows the client to indicate to the recognizer 
        that it MUST save the audio stream for the best repetition of the 
        phrase that was used during the enrollment session.  The recognizer 
        MUST then record the recognized audio and make it available to the 
        client in the form of a URL returned in the waveform-url header 
        field in the RECOGNITION-COMPLETE event.  If there was an error in 
        recording the stream or the audio clip is otherwise not available, 
        the recognizer MUST return an empty waveform-url header field. 
         
          save-best-waveform  = "Save-Best-Waveform" ":" Boolean-value CRLF 
         

      
     Burnett, et al.               IETF-Draft                       Page 17 
                                MRCP Extensions               October 2003 

     5.4.9. Waveform-URL  
         
        If the Save-Best-Waveform header field is set to true, the 
        recognizer MUST record the incoming audio stream of the recognition 
        into a file and provide a URL for the client to access it.  This 
        header MUST be present in the RECOGNITION-COMPLETE event if the 
        Save-Best-Waveform header field was set to true.  The URL value of 
        the header MUST be empty if there was some error preventing the 
        server from recording.  Otherwise, the URL generated by the server 
        MUST be unique across the server and all its recognition and 
        enrollment sessions.   
         
          waveform-url        ="Waveform-URL" ":" Url CRLF 
         
         
     5.4.10. New-Phrase-Id  
         
        This header field replaces the id used to identify the phrase in a 
        personal grammar.  The recognizer returns the new id when using an 
        enrollment grammar.  This header field MAY occur in MODIFY-PHRASE 
        requests. 
         
          new-phrase-id       =    "New-Phrase-ID" ":" 1*ALPHA CRLF 
      
      
     5.4.11. Confusable-Phrases-URI  
         
        This optional header field specifies the grammar that defines 
        invalid phrases for enrollment.  For example, typical applications 
        do not allow an enrolled phrase that is also a command word.  This 
        header field MAY occur in RECOGNIZE requests. 
         
          confusable-phrases-uri         
                              =    "Confusable-Phrases-URI" ":" Url CRLF 
         
     5.4.12. Abort-Phrase-Enrollment  
          
        This header field can optionally be specified in the END-PHRASE-
        ENROLLMENT method to abort the phrase enrollment, rather than 
        committing the phrase to the personal grammar.  
          
        abort-phrase-enrollment    = "Abort-Phrase-Enrollment" ":" Boolean- 
        value CRLF 
         
         
     5.4.13. Completion-Cause 
         
        This header field is from the recognizer resource and it MUST be 
        specified in a RECOGNITION-COMPLETE event coming from the recognizer 
        resource to the client. This indicates the reason behind the 
        RECOGNIZE request completion. 
         
        The error codes used for Enrollment should not clash with those for 
        normal recognition.  There are no completion-cause values specific 
      
     Burnett, et al.               IETF-Draft                       Page 18 
                                MRCP Extensions               October 2003 

        to enrollment, so please refer to the original MRCP specification 
        for valid completion causes. 
         
          completion-cause    =    "Completion-Cause" ":" 1*DIGIT SP 
                                   1*ALPHA CRLF 
      
      
     5.5. Enrollment Result Elements 
         
        Enrollment results can contain the following elements: 
         
          enrollment-result-elements  = 
                                num-clashes                 ; Section 5.5.1 
                              | num-good-repetitions        ; Section 5.5.2 
                              | num-repetitions-still-needed; Section 5.5.3 
                              | consistency-status          ; Section 5.5.4 
                              | clash-phrase-ids            ; Section 5.5.5 
                              | transcriptions              ; Section 5.5.6 
                              | confusable-phrases          ; Section 5.5.7 
         
      
         
     5.5.1. Num-Clashes 
         
        This is not a header field, but part of the recognition results. It 
        is returned in a RECOGNITION-COMPLETE event.  Its value represents 
        the number of clashes that this pronunciation has with other 
        pronunciations in an active enrollment session.  The header field 
        Clash-Threshold determines the sensitivity of the clash measurement.  
        Clash testing can be turned off completely by setting Clash-
        Threshold to 0. 
         
          num-clashes    = "<num-clashes>" 1*DIGIT "</num-clashes>" CRLF 
      
          
     5.5.2. Num-Good-Repetitions 
         
        This is not a header field, but part of the recognition results. It 
        is returned in a RECOGNITION-COMPLETE event.  Its value represents 
        the number of consistent pronunciations obtained so far in an active 
        enrollment session. 
         
          num-good-repetitions = "<num-good-repetitions>" 1*DIGIT 
                                 "</num-good-repetitions>"  CRLF 
         
         
     5.5.3. Num-Repetitions-Still-Needed 
         
        This is not a header field, but part of the recognition results. It 
        is returned in a RECOGNITION-COMPLETE event.  Its value represents 
        the number of consistent pronunciations that must still be obtained 
        before the new phrase can be added to the enrollment grammar.  The 
        number of consistent pronunciations required is determined by the 
        parameter Num-Min-Consistent-Pronunciations, whose default value is 
      
     Burnett, et al.               IETF-Draft                       Page 19 
                                MRCP Extensions               October 2003 

        two.  The returned value must be 0 before the system will allow you 
        to end an enrollment session for a new phrase. 
         
          num-repetitions-still-needed =  
                         "<num-repetitions-still-needed>" 1*DIGIT 
                         "</num-repetitions-still-needed>" CRLF 
         
         
     5.5.4. Consistency-Status 
         
        This is not a header field, but part of the recognition results. It 
        is returned in a RECOGNITION-COMPLETE event. This is used to 
        indicate how consistent the repetitions are when learning a new 
        phrase. It can have the values of CONSISTENT, INCONSISTENT and 
        UNDECIDED. 
         
          consistency-status       = "<consistency-status>" 1*ALPHA 
                                     "</consistency-status>" CRLF 
         
         
     5.5.5. Clash-Phrase-Ids 
         
        This is not a header field, but part of the recognition results. It 
        is returned in a RECOGNITION-COMPLETE event.  This gets filled with 
        the phrase ids of the clashing pronunciation(s).  This field is 
        absent if there are no clashes.  This MAY occur in RECOGNITION-
        COMPLETE events.  
         
          phrase-id           = "<item>" 1*ALPHA "</item>" CRLF 
          clash-phrase-ids    = "<clash-phrase-ids>" 1*phrase-id 
                                "</clash-phrase-ids>" CRLF 
         
         
     5.5.6. Transcriptions 
         
        This is not a header field, but part of the recognition results. It 
        is optionally returned in a RECOGNITION-COMPLETE event.  This gets 
        filled with the transcriptions returned in the last repetition of 
        the phrase being enrolled. This MAY occur in RECOGNITION-COMPLETE 
        events.  
      
          transcription       = "<item>" 1*OCTET "</item>" CRLF 
          transcriptions      = "<transcriptions>" 1*transcription 
                                "</transcriptions>" CRLF 
         
         
     5.5.7. Confusable-Phrases 
         
        This is not a header field, but part of the recognition results. It 
        is optionally returned in a RECOGNITION-COMPLETE event.  This gets 
        filled with the list of phrases from a command grammar that are 
        confusable with the phrase being added to the personal grammar.  
        This MAY occur in RECOGNITION-COMPLETE events.  
      
      
     Burnett, et al.               IETF-Draft                       Page 20 
                                MRCP Extensions               October 2003 

          Confusable-phrase   = "<item>" 1*OCTET "</item>" CRLF 
          confusable-phrases  = "<confusable-phrases>" 1*confusable-phrase 
                                "</confusable-phrases>" CRLF 
         
         
     5.6. Enrollment Methods 
      
         
     5.6.1. START-PHRASE-ENROLLMENT 
         
        The START-PHRASE-ENROLLMENT method sent from the client to the 
        server starts a new phrase enrollment session during which the 
        client may call RECOGNIZE to enroll a new utterance.  This consists 
        of a set of calls to RECOGNIZE in which the caller speaks a phrase 
        several times so the system can "learn" it. You then add the phrase 
        to a personal grammar (speaker-trained grammar), and the system can 
        recognize it later. 
         
        Only one phrase enrollment session may be active at a time. The 
        Personal-Grammar-URI identifies the grammar that is used during 
        enrollment to store the personal list of phrases.  Once RECOGNIZE is 
        called, the result is returned in a RECOGNITION-COMPLETE event and 
        may contain either an enrollment result OR a recognition result for 
        a regular recognition.  
         
        Calling END-PHRASE-ENROLLMENT ends the ongoing phrase enrollment 
        session, which is typically done after a sequence of successful 
        calls to RECOGNIZE.  This method can be called to commit the new 
        phrase to the personal grammar or to abort the phrase enrollment 
        session.  
         
        The Personal-Grammar-URI, which specifies the grammar to contain the 
        new enrolled phrase, will be created if it does not exist. Also, the 
        personal grammar may ONLY contain phrases added via a phrase  
        enrollment session.  
      
        The Phrase-ID passed to this method will be used to identify this 
        phrase in the grammar and will be returned as the speech input when 
        doing a RECOGNIZE on the grammar. The Phrase-NL similarly will be 
        returned in a RECOGNITION-COMPLETE event in the same manner as other 
        NL in a grammar. The tag-format of this NL is vendor specific.  
      
        If the client has specified Save-Best-Waveform as true, then the 
        response after ending the phrase enrollment session should contain 
        the location/URL of a recording of the best repetition of the 
        learned phrase. 
      
        Example: 
        C->S:  START-PHRASE-ENROLLMENT 543258 MRCP/1.0  
               Num-Min-Consistent-Pronunciations: 2 
               Consistency-Threshold: 30 
               Clash-Threshold: 12 
               Personal-Grammar-URI: <personal grammar uri> 
               Phrase-Id: <phrase id> 
      
     Burnett, et al.               IETF-Draft                       Page 21 
                                MRCP Extensions               October 2003 

               Phrase-NL: <NL phrase> 
               Weight: 1 
               Save-Best-Waveform: true 
         
        S->C:  MRCP/1.0 543258 200 COMPLETE 
      
         
     5.6.2. RECOGNIZE 
         
        The RECOGNIZE method from the client to the server starts an ongoing 
        enrollment/recognition during which either the phrase is learned, or 
        recognition occurs against the grammar passed to RECOGNIZE. A START-
        OF-SPEECH event followed by a RECOGNITION-COMPLETE event should be 
        expected.  
         
        There can only be a single RECOGNIZE operation IN-PROGRESS at a time 
        and this method MUST be called during an ongoing START-PHRASE-
        ENROLLMENT if enrollment is desired.   
         
        If the RECOGNIZE request contains a Content-Id header field then the 
        resulting grammar (which includes the personal grammar as a sub-
        grammar) can be referenced from elsewhere by using "session:my-
        grammar". 
         
        Example: 
        C->S:  RECOGNIZE 543259 MRCP/1.0 
               Content-Type: application/grammar+xml 
               Content-Id: my-grammar 
               Content-Length: 123 
         
               <?xml version="1.0"?>  
         
               <!-- the default grammar language is US English -->  
               <grammar xml:lang="en-US" version="1.0">  
                
               <!—- example command: Help -->  
               <rule id="UniversalCommand" scope=public>  
                    <one-of> 
                         <item> help </item> 
                         <item> cancel </item> 
                    </one-of> 
               </rule>  
         
        S->C:  MRCP/1.0 543259 200 IN-PROGRESS 
         
        S->C:  START-OF-SPEECH 543259 200 MRCP/1.0 
         
         
     5.6.3. STOP 
         
        The STOP method from the client to the server may only be called 
        during an ongoing RECOGNIZE operation and is used to abort that 
        recognition. No RECOGNITION-COMPLETE event will follow.   
         
      
     Burnett, et al.               IETF-Draft                       Page 22 
                                MRCP Extensions               October 2003 

        There is no difference in behavior for regular recognition versus an 
        enrollment.  It is included here for completeness. 
      
        Example: 
        C->S:  STOP 543258 MRCP/1.0 
         
        S->C:  MRCP/1.0 543258 200 COMPLETE 
               Active-Request-Id-List: 543259 
      
         
         
     5.6.4. ENROLLMENT-ROLLBACK 
      
        The ENROLLMENT-ROLLBACK method discards the last live utterances 
        from the RECOGNIZE operation. This method should be invoked when the 
        caller provides undesirable input such as non-speech noises, side-
        speech, commands, utterance from the RECOGNIZE grammar, etc. Note 
        that this method does not provide a stack of rollback states. 
        Executing ENROLLMENT-ROLLBACK twice in succession without an 
        intervening recognition operation has no effect on the second 
        attempt. 
         
        Example: 
        C->S:  ENROLLMENT-ROLLBACK 543261 MRCP/1.0 
         
        S->C:  MRCP/1.0 543261 200 COMPLETE 
         
     5.6.5. END-PHRASE-ENROLLMENT  
          
        The END-PHRASE-ENROLLMENT method can only be called during an active 
        phrase enrollment session, which was started by calling the method 
        START-PHRASE-ENROLLMENT.  It may NOT be called during an ongoing 
        RECOGNIZE operation. It should be called when successive calls to 
        RECOGNIZE have succeeded and Num-Repetitions-Still-Needed has been 
        returned as 0 in the RECOGNITION-COMPLETE event to commit the new 
        phrase in the grammar.  Alternatively, it can be called by 
        specifying the Abort-Phrase-Enrollment header to abort the phrase 
        enrollment session.   
         
        If the client has specified Save-Best-Waveform as true in the START-
        PHRASE-ENROLLMENT request, then the response should contain the 
        location/URL of a recording of the best repetition of the learned 
        phrase. 
      
        Example: 
        C->S:  END-PHRASE-ENROLLMENT 543262 MRCP/1.0  
           
         
        S->C:  MRCP/1.0 543262 200 COMPLETE 
               Waveform-URL: <waveform url> 
      
         
         

      
     Burnett, et al.               IETF-Draft                       Page 23 
                                MRCP Extensions               October 2003 

     5.6.6. MODIFY-PHRASE 
         
        The MODIFY-PHRASE method sent from the client to the server is used 
        to change the phrase ID, NL phrase and/or weight for a given phrase 
        in a personal grammar. 
         
        If no fields are supplied then calling this method has no effect and 
        it is silently ignored. 
         
     Example: 
        C->S:  MODIFY-PHRASE 543265 MRCP/1.0  
               Personal-Grammar-URI: <personal grammar uri> 
               Phrase-Id: <phrase id> 
               New-Phrase-Id: <new phrase id> 
               Phrase-NL: <NL phrase> 
               Weight: 1 
      
        S->C:  MRCP/1.0 543265 200 COMPLETE  
      
         
     5.6.7. DELETE-PHRASE 
         
        The DELETE-PHRASE method sent from the client to the server is used 
        to delete a phase in a personal grammar added through voice 
        enrollment or text enrollment. If the specified phrase doesn’t 
        exist, this method has no effect and it is silently ignored. 
         
     Example: 
        C->S:  DELETE-PHRASE 543266 MRCP/1.0  
               Personal-Grammar-URI: <personal grammar uri> 
               Phrase-Id: <phrase id> 
         
        S->C:  MRCP/1.0 543266 200 COMPLETE 
         
         
     5.6.8. RECOGNITION-COMPLETE 
         
        The RECOGNITION-COMPLETE event follows a method call to RECOGNIZE 
        and is used to communicate to the client the results of the 
        enrollment.  Note that the event can contain recognition or 
        enrollment results depending on what was spoken. 
         
        Example: 
        S->C:  RECOGNITION-COMPLETE 543259 200 MRCP/1.0 
               Completion-Cause: 000 success 
               Content-Type: application/x-nlsml 
               Content-Length: 123 
         
               <?xml version= "1.0"?> 
               <result grammar="Personal-Grammar-URI"> 
               <extensions> 
                  <result-type type="ENROLLMENT" /> 
                  <enrollment-result> 

      
     Burnett, et al.               IETF-Draft                       Page 24 
                                MRCP Extensions               October 2003 

                    <num-clashes> 2 </num-clashes> 
                    <num-good-repetitions> 1 </num-good-repetitions> 
                    <num-repetitions-still-needed> 1 </num-repetitions-
        still-needed> 
                    <consistency-status> consistent </consistency-status> 
                    <clash-phrase-ids>  
                         <item> Jeff </item> <item> Andre </item>  
                    </clash-phrase-ids> 
                    <transcriptions> 
                         <item> m ay b r ow k er </item>  
                         <item> m ax r aa k ah </item> 
                    </transcriptions> 
                    <confusable-phrases> 
                         <item> 
                              <phrase> call </phrase> 
                              <confusion-level> 10 </confusion-level> 
                         </item> 
                    </confusable-phrases> 
                  </enrollment-result> 
               </extensions> 
               </result> 
                
         
         
           





























      
     Burnett, et al.               IETF-Draft                       Page 25 
                                MRCP Extensions               October 2003 

     6.   Speaker Verification and Identification 
         
        This document captures the extensions required to implement Voice 
        Enrollment, Speaker Verification / Identification and Hotword 
        recognition using MRCP.  This section describes the methods, 
        responses and events needed for doing Speaker Verification / 
        Identification. 
         
     6.1. Speaker Verification/Identification Resource 
         
        Speaker verification is a voice authentication feature that can be 
        used to identify the speaker in order to grant the user access to 
        sensitive information and transactions.  To do this, a recorded 
        utterance is compared to a voiceprint previously stored for that 
        user.  Verification consists of two phases: a designation phase to 
        establish the claimed identity of the caller and an execution phase 
        in which a voiceprint is either created (training) or used to 
        authenticate the claimed identity (verification). 
         
        Speaker identification identifies the speaker from a set of valid 
        users, such as family members.  Identification can be performed on a 
        small set of users or for a large population.  This feature is 
        useful for applications where multiple users share the same account 
        number, but where the individual speaker must be uniquely identified 
        from the group.  Speaker identification is also done in two phases, 
        a designation phase and an execution phase. 
         
        It is possible for a speaker verification resource to share the same 
        session as an existing recognizer resource or a speaker verification 
        session can be SETUP to operate in standalone mode, without a 
        recognizer resource sharing the same session.In order to share the 
        same session, the SETUP message for the verification resource should 
        include the RTSP session identifier of the recognizer resource it 
        wishes to share.  If no session identifier is specified, an 
        independent verification resource, running on the same physical 
        server or a separate one, will be set up. 
         
        Some of the speaker verification methods, described below, apply 
        only to a specific mode of operation. 
         
        The verification resource supports some buffering methods that allow 
        the user to buffer the verification data from one or more utterances 
        and then process this set of utterances as a single entity.  This is 
        different from collecting waveforms and processing them using the 
        verification methods that operate directly on the incoming audio 
        stream because the buffering mechanism does not simply accumulate 
        utterance data to a buffer.  In particular, when both the 
        recognition and verification resources share the same session, 
        additional information gathered by the recognition resource is saved 
        with these buffers to improve verification performance. 
         



      
     Burnett, et al.               IETF-Draft                       Page 26 
                                MRCP Extensions               October 2003 

     6.2. SETUP Verification/Identification Resource 
      
        The SETUP method from the client to the server is used to open a 
        resource for verification/identification from a media server. If 
        session-id header field is specified in the SETUP method, the 
        verification/identification resource would share the same session 
        with other resources in the session. Otherwise, a new session would 
        be created for the verification/identification resource. The 
        resource name is ’verification-resource’. 
         
        Example: 
        This example assumes the verification resource would share a session 
        that is already created. 
         
        C->S:  SETUP rtsp://media.server.com/media/verification-resource 
        RTSP/1.0  
               CSeq: 1  
               Transport: RTP/AVP;unicast;client_port=46456-46457 
               Session: 0a030258_00003815_3bc4873a_0001_0000  
                
        S->C:  RTSP/1.0 200 OK  
               CSeq: 1  
               Transport: RTP/AVP;unicast;client_port=46456-46457;  
                          server_port=46460-46461  
               Session: 0a030258_00003815_3bc4873a_0001_0000  
      
     6.3. Speaker Verification State Machine  
         
        Speaker Verification has a concept of a training, verification or 
        buffering sessions.  Starting one of these sessions does not change 
        the state of the verification resource, i.e. it remains idle.  Once 
        a verification or training session is started, then utterances are 
        trained or verified by calling the VERIFY or VER-FROM-BUFFER method.  
        The state of the Speaker Verification resources goes from IDLE to 
        VERIFYING state each time VERIFY or VER-FROM-BUFFER is called. 
         
     6.4. Speaker Verification Methods 
         
        Speaker Verification supports the following methods. 
          verification-method  = "VER-START-SESSION" 
                              | "VER-END-SESSION" 
                              | "VER-SET-VOICEPRINT" 
                              | "VER-DELETE-VOICEPRINT" 
                              | "VERIFY" 
                              | "VER-FROM-BUFFER" 
                              | "VER-ROLLBACK" 
                              | "VER-STOP" 
                              | "VER-START-TIMERS" 
                              | "SET-PARAMS" 
                              | "GET-PARAMS" 
           
     6.5. Verification Events 
         
        Speaker Verification may generate the following events. 
      
     Burnett, et al.               IETF-Draft                       Page 27 
                                MRCP Extensions               October 2003 

          verification-event   =  "VERIFICATION-COMPLETE" 
                              |   "START-OF-SPEECH" 
         
     6.6. Verification Header Fields 
         
        A Speaker Verification request may contain header fields containing 
        request options and information to augment the Request, Response or 
        Event message it is associated with.  
         
        The verification result elements will be returned in a VERIFICATION-
        COMPLETE event containing an NLSML document [4], having a MIME-type 
        application/x-nlsml. 
         
        verification-header  = 
                                voiceprint-uri              ; Section 6.6.1 
                              | voiceprint-identifier       ; Section 6.6.2 
                              | voiceprint-group            ; Section 6.6.3 
                              | verification-mode           ; Section 6.6.4 
                              | adapt-model                 ; Section 6.6.5 
                              | abort-model                 ; Section 6.6.6 
                              | security-level              ; Section 6.6.7 
                              | num-min-verification-phrases; Section 6.6.8 
                              | num-max-verification-phrases; Section 6.6.9 
                              | no-input-timeout            ; Section 6.6.10 
                              | save-waveform               ; Section 6.6.11 
                              | waveform-url                ; Section 6.6.12 
                              | vendor-specific             ; Section 6.6.13 
                              | voiceprint-exists           ; Section 6.6.14 
                              | ver-buffer-utterance        ; Section 6.6.15 
                              | input-waveform-url          ; Section 6.6.16 
                              | verification-type           ; Section 6.6.17 
                              | digit-sequence              ; Section 6.6.18 
                              | completion-cause            ; Section 6.6.19 
                                     
                               
                               
          Parameter            Support   Methods/Events  
        voiceprint-uri         MANDATORY VER-SET-VOICEPRINT,  
                                         VER-DELETE-VOICEPRINT 
        voiceprint-identifier  MANDATORY VER-SET-VOICEPRINT,  
                                         VER-DELETE-VOICEPRINT 
        voiceprint-group       OPTIONAL  VER-SET-VOICEPRINT,  
                                         VER-DELETE-VOICEPRINT 
        verification-mode      MANDATORY SET-PARAMS, GET-PARAMS, 
                                         VERIFY, VER-FROM-BUFFER 
        adapt-model            OPTIONAL  VER-START-SESSION 
        abort-model            OPTIONAL  VER-END-SESSION 
         
        security-level         OPTIONAL  SET-PARAMS, GET-PARAMS, 
                                         VERIFY, VER-FROM-BUFFER 
        num-min-verification   OPTIONAL  SET-PARAMS, GET-PARAMS, 
         -phrases                        VERIFY, VER-FROM-BUFFER 
        num-max-verification   OPTIONAL  SET-PARAMS, GET-PARAMS, 
         -phrases                        VERIFY, VER-FROM-BUFFER 
      
     Burnett, et al.               IETF-Draft                       Page 28 
                                MRCP Extensions               October 2003 

        no-input-timeout       MANDATORY SET-PARAMS, GET-PARAMS,  
                                         VERIFY 
        save-waveform          MANDATORY SET-PARAMS, GET-PARAMS,  
                                         VERIFY 
        waveform-url           MANDATORY VERIFICATION-COMPLETE 
        vendor-specific        MANDATORY SET-PARAMS, GET-PARAMS 
        voiceprint-exists      MANDATORY VER-SET-VOICEPRINT,  
                                         VER-DELETE-VOICEPRINT 
        ver-buffer-utterance   OPTIONAL  RECOGNIZE, VERIFY, RECORD 
        input-waveform-url     OPTIONAL  VERIFY 
        verification-type      OPTIONAL  START-PHRASE-ENROLLMENT 
        digit-sequence         OPTIONAL  START-PHRASE-ENROLLMENT 
        completion-cause       MANDATORY VERIFICATION-COMPLETE 
                                         VER-SET-VOICEPRINT,  
                                         VER-DELETE-VOICEPRINT 
         
         
         
      
     6.6.1. Voiceprint-URI  
         
        This parameter specifies the voiceprint repository to be used or 
        referenced during speaker verification or identification operations.  
        This header field is required in VER-SET-VOICEPRINT and 
        VER-DELETE-VOICEPRINT method. If this header field is set through 
        the SET-PARAMS method, it can be silently ignored. 
         
          voiceprint-uri = "Voiceprint-URI" ":" Url CRLF 
         
     6.6.2. Voiceprint-Identifier 
      
        This header field specifies the claimed identity for voice 
        verification applications.  The claimed identity may be used to 
        specify an existing voiceprint or to establish a new voiceprint. 
        This header field is required in VER-SET-VOICEPRINT and VER-DELETE-
        VOICEPRINT method executions in preparation for verification 
        application operations. The Voiceprint-Identifier is not required 
        for identification applications except in the VER-DELETE-VOICEPRINT 
        method when the client needs to remove an identity from a voiceprint 
        group.  
         
          voiceprint-identifier = "Voiceprint-Identifier" ":" 1*ALPHA CRLF 
         
     6.6.3. Voiceprint-Group 
      
        This header field specifies the voiceprint group for speaker 
        identification operations.  The voiceprint group narrows the 
        potential voiceprint identification candidates to a subset of the 
        voiceprints in the repository. This header field may appear in VER-
        SET-VOICEPRINT and VER-DELETE-VOICEPRINT method executions for 
        speaker identification operations. If this header field is absent, 
        then verification, not identification, operations will be executed. 
         
          voiceprint-group = "Voiceprint-Group" ":" 1*ALPHA CRLF 
      
     Burnett, et al.               IETF-Draft                       Page 29 
                                MRCP Extensions               October 2003 

         
     6.6.4. Verification-Mode 
         
        This header field specifies the mode of the verification resource in 
        a VERIFY or VER-FROM-BUFFER method execution. Acceptable values 
        indicate whether the verification session should train a voiceprint 
        ("train") or verify/identify using an existing voiceprint 
        ("verify").  
         
        Setting this header field to "train" or "verify" requires that the 
        voiceprint or voiceprint group identifier attributes have already 
        been set through the VER-SET-VOICEPRINT method.  
         
        Training and verification sessions both require the voiceprint URI 
        to be specified at the start of the session.  In many usage 
        scenarios, however, the system cannot know the speaker’s claimed 
        identity until the speaker says, for example, their account number.  
        In order to allow the first few utterances of a dialog to be both 
        recognized and verified, the verification resource on the MRCP 
        server retains an audio buffer. In this audio buffer, the MRCP 
        server will accumulate recognized utterances in memory.  The 
        application can later execute a verification method and apply the 
        buffered utterances to the current verification session. The 
        buffering methods are used for this purpose. When buffering is used, 
        subsequent input utterances are added to the audio buffer for later 
        analysis. 
         
        Some voice user interfaces may require additional user input that 
        should not be analyzed for verification. For example, the user’s 
        input may have been recognized with low confidence and thus require 
        a confirmation cycle. In such cases, the client should not execute 
        the VERIFY or VER-FROM-BUFFER methods to collect and analyze the 
        caller’s input. A separate recognizer resource can analyze the 
        caller’s response without any participation on behalf of the 
        verification resource.  
         
        Once the following conditions have been met:  
        1. Voiceprint identity has been successfully established through the 
           voiceprint identifier header fields of the VER-SET-VOICEPRINT 
           method, and 
        2. the verification mode has been set to one of "train" or "verify", 
        the verification resource may begin providing verification 
        information during verification operations. The verification 
        resource MUST reach one of the two major states ("train" or 
        "verify") if the above two conditions hold, or it MUST report an 
        error condition in the MRCP status code to indicate why the 
        verification resource is not ready for action. 
         
        The value of verification-mode is persistent within a verification 
        session. Changing the mode to a different value than the previous 
        setting causes the verification resource to report an error if the 
        previous setting was either "train" or "verify". If the mode is 
        changed back to its previous value, the operation may continue.  
          verification-mode = "Verification-Mode" ":"  
      
     Burnett, et al.               IETF-Draft                       Page 30 
                                MRCP Extensions               October 2003 

                               verification-mode-string 
       verification-mode-string = "train" 
                                   | "verify" 
      
         
     6.6.5. Adapt-Model 
         
        This header field indicates the desired behavior of the verification 
        resource after a successful verification execution. If the value of 
        this parameter is "true", the audio collected during the 
        verification session is used to update the voiceprint to account for 
        ongoing changes in a speaker’s incoming speech characteristics. If 
        the value is "false" (the default), the voiceprint is not updated 
        with the latest audio. This header field MAY only occur in VER-
        START-SESSION method.  
      
          adapt-model = "Adapt-Model" ":" Boolean-value CRLF 
         
         
     6.6.6. Abort-Model 
         
        The Abort-Model header field indicates the desired behavior of the 
        verification resource upon session termination. If the value of this 
        parameter is "true", the pending changes to a voiceprint due to 
        verification training or verification adaptation are discarded. If 
        the value is "false" (the default), the pending changes for a 
        training session or a successful verification session are committed 
        to the voiceprint repository. A value of "true" for Abort-Model 
        overrides a value of "true" for the Adapt-Model header field. This 
        header field MAY only occur in VER-END-SESSION method.  
      
          abort-model = "Abort-Model" ":" Boolean-value CRLF 
         
         
         
     6.6.7. Security-Level 
      
        The Security-Level header field determines the range of verification 
        scores in which a decision of ’accepted’ may be declared. This 
        header field MAY occur in SET-PARAMS, GET-PARAMS, VERIFY and VER-
        FROM-BUFFER methods. It can be "high" (highest security level), 
        "medium-high", "medium" (normal security level), "medium-low", or 
        "low" (low security level). The default value is platform specific. 
         
          security-level = "Security-Level" ":" security-level-string CRLF 
          security-level-string = "high" | 
                "medium-high" | 
                "medium" |  
                "medium-low" | 
                "low" 




      
     Burnett, et al.               IETF-Draft                       Page 31 
                                MRCP Extensions               October 2003 

      
      
     6.6.8. Num-Min-Verification-Phrases 
      
        The Num-Min-Verification-Phrases header field is used to specify the 
        minimum number of valid utterances before a positive decision is 
        given for verification. The value for this parameter is integer and 
        the default value is 1. The verification resource should not 
        announce a decision of ’accepted’ unless the Num-Min-Verification-
        Phrases utterances are available. The minimum value is 1. 
         
          num-min-verification-phrases = "Num-Min-Verification-Phrases" ":"  
                                          1*DIGIT CRLF 
         
         
     6.6.9. Num-Max-Verification-Phrases 
      
        The Num-Max-Verification-Phrases header field is used to specify the 
        number of valid utterances required before a decision is forced for 
        verification. The verification resource MUST NOT return a decision 
        of ’undecided’ once Num-Max-Verification-Phrases have been collected 
        and used to determine a verification score. The value for this 
        parameter is integer and the minimum value is 1.  
         
          num-min-verification-phrases = "Num-Max-Verification-Phrases" ":"  
                                          1*DIGIT CRLF 
      
         
     6.6.10. No-Input-Timeout 
      
        The No-Input-Timeout header field sets the length of time from the 
        start of the verification timers (see VER-START-TIMERS) until the 
        declaration of a no-input event in the VERIFICATION-COMPLETE server 
        event message. The value is in milliseconds. This header field MAY 
        occur in VERIFY, SET-PARAMS or GET-PARAMS. The value for this field 
        ranges from 0 to MAXTIMEOUT, where MAXTIMEOUT is platform specific. 
        The default value for this field is platform specific.  
              
          no-input-timeout = "No-Input-Timeout" ":" 1*DIGIT CRLF 
      
      
     6.6.11. Save-Waveform 
      
        This header field allows the client to indicate to the verification 
        resource that it MUST save the audio stream that was used for 
        verification/identification. The verification resource MUST then 
        record the audio and make it available to the client in the form of 
        a URI returned in the waveform-uri header field in the  
        VERIFICATION-COMPLETE event. If there was an error in recording the 
        stream or the audio clip is otherwise not available, the 
        verification resource MUST return an empty waveform-uri header 
        field. The default value for this field is "false". This header 
        field MAY appear in the VERIFY method, but NOT in the VER-FROM-

      
     Burnett, et al.               IETF-Draft                       Page 32 
                                MRCP Extensions               October 2003 

        BUFFER method since it can control whether or not to save the 
        waveform for live verification / identification operations only. 
          
             save-waveform       =    "Save-Waveform" ":" boolean-value CRLF  
      
      
     6.6.12. Waveform-URL 
      
        If the save-waveform header field is set to true, the verification 
        resource MUST record the incoming audio stream of the verification 
        into a file and provide a URI for the client to access it. This 
        header MUST be present in the VERIFICATION-COMPLETE event if the 
        save-waveform header field is set to true. The URL value of the 
        header MUST be NULL if there was some error condition preventing the 
        server from recording. Otherwise, the URL generated by the server 
        SHOULD be globally unique across the server and all its verification 
        sessions. The URL SHOULD BE available until the session is torn 
        down. Since the save-waveform header field applies only to live 
        verification / identification operations, the waveform-url will only 
        be returned in the VERIFICATION-COMPLETE event for live verification 
        / identification operations. 
              
           waveform-url = "Waveform-URL" ":" Url CRLF 
      
      
     6.6.13. Vendor-Specific 
      
        This set of headers allows the client to set Vendor Specific 
        parameters. 
         
           vendor-specific = "Vendor-Specific-Parameters" ":"  
                             vendor-specific-av-pair   
                             *[";" vendor-specific-av-pair] CRLF   
           vendor-specific-av-pair = vendor-av-pair-name "="   
                                     vendor-av-pair-value  
         
        This header can be sent in the SET-PARAMS method and is used to set  
        vendor-specific parameters on the server. The vendor-av-pair-name    
        can be any vendor-specific field name and conforms to the XML  
        vendor-specific attribute naming convention. The vendor-av-pair-
        value is the value to set the attribute to, and needs to be quoted.  
         
        When asking the server to get the current value of these parameters,  
        this header can be sent in the GET-PARAMS method with the list of  
        vendor-specific attribute names to get separated by a semicolon.     
        This header field MAY occur in SET-PARAMS or GET-PARAMS. 
         
         
     6.6.14. Voiceprint-Exists 
         
        This header field is returned in a VER-SET-VOICEPRINT or VER-DELETE-
        VOICEPRINT response.  This is the status of the voiceprint specified 
        in the VER-SET-VOICEPRINT method. For the VER-DELETE-VOICEPRINT 

      
     Burnett, et al.               IETF-Draft                       Page 33 
                                MRCP Extensions               October 2003 

        method this field indicates the status of the voiceprint as the 
        method execution started. 
         
          voiceprint-exists    = "Voiceprint-Exists" ":" Boolean-value CRLF 
         
         
     6.6.15. Ver-Buffer-Utterance  
          
        This header field is used to indicate that this utterance should be 
        considered for Speaker Verification.  This way, an application can 
        buffer utterances while doing regular recognition or verification 
        activities and speaker verification can later be requested on the 
        buffered utterances.  This header field is OPTIONAL in the 
        RECOGNIZE, VERIFY or RECORD method.  
          
          ver-buffer-utterance = "Ver-Buffer-Utterance" : Boolean-value CRLF  
         
         
     6.6.16. Input-Waveform-Url 
         
        This optional header field specifies an audio file that has to be 
        processed according to the current verification mode, either to 
        train the voiceprint or verify the user.  This enables the client to 
        implement the buffering use case also in the case where the 
        recognizer and verification resources live in two sessions.  It MAY 
        be part of the VERIFY method. 
         
          input-waveform-url    = "Input-Waveform-URL" ":" Url CRLF 
         
         
     6.6.17. Verification-Type 
         
        This optional header field specifies whether this is text-
        independent, text dependant or digit string based verification.  It 
        MAY be part of the VERIFY method.  The default for this field is 
        "text-independent". 
         
          verification-type = "Verification-Type" ":"  
                              verification-type-string 
          verification-type-string = "text-independent" 
                                   | "text-dependent" 
                                   | "digits" 
         
     6.6.18. Digit-Sequence 
         
        This optional header field specifies the digit sequence to use for 
        verification if the verification mode is "digits".  It MAY be part 
        of the VERIFY method. 
         
          digit-sequence = "digit-sequence" ":" 1*ALPHA CRLF 
         
     6.6.19. Completion-Cause 
         

      
     Burnett, et al.               IETF-Draft                       Page 34 
                                MRCP Extensions               October 2003 

        This header field MUST be part of a VERIFICATION-COMPLETE event   
        coming from the verification resource to the client. This indicates 
        the reason behind the VERIFY or VER-FROM-BUFFER method completion. 
        This header field MUST BE sent in the VERIFY, VER-FROM-BUFFER, VER-
        SET-VOICEPRINT responses, if they return with a failure status and a 
        COMPLETE state. 
              
          completion-cause = "Completion-Cause" ":" 1*DIGIT SP  
                             1*ALPHA CRLF  
              
          Cause-Code  Cause-Name         Description  
            000       success            VERIFY or VER-FROM-BUFFER request 
                                         completed successfully. The verify 
                                         decision can be "accepted", 
                                         "rejected", or "undecided". 
            001       error              VERIFY or VER-FROM-BUFFER request 
                                         terminated prematurely due to a  
                                         verification resource or system 
                                         error.  
            002       no-input-timeout   VERIFY request completed with no 
                                         result due to a no-input-timeout. 
            003       too-much-speech-timeout   VERIFY request completed 
                                         result due to too much speech           
            004       speech-too-early   VERIFY request completed with no   
                                         result due to spoke too soon. 
            005       buffer-empty       VER-FROM-BUFFER request completed  
                                         with no result due to empty buffer. 
            006       out-of-sequence    Verification operation failed due  
                                         to out-of-sequence method 
                                         invocations. For example calling 
                                         VERIFY before VER-SET-VOICEPRINT. 
            007       voiceprint-uri-failure 
                                         Failure accessing voiceprint URI. 
            008       voiceprint-uri-missing 
                                         Voiceprint-uri is not specified. 
            007       voiceprint-id-missing 
                                         Voiceprint-identification is not  
                                         specified. 
            008       voiceprint-id-not-exist 
                                         Voiceprint-identification doesn’t  
                                         exist in the voiceprint repository. 
            009       voiceprint-group-not-exist 
                                         Voiceprint-group doesn’t exist. 
         
     6.7. Verification Result Elements 
         
        Enrollment results can contain the following elements: 
         
        verification-result-elements = 
                              | decision                    ; Section 6.7.1 
                              | num-frames                  ; Section 6.7.2 
                              | device                      ; Section 6.7.3 
                              | gender                      ; Section 6.7.4 
                              | matched                     ; Section 6.7.5 
      
     Burnett, et al.               IETF-Draft                       Page 35 
                                MRCP Extensions               October 2003 

                              | adapted                     ; Section 6.7.6 
                              | verification-score          ; Section 6.7.7 
                              | group-name                  ; Section 6.7.8 
                              | member                      ; Section 6.7.9 
                              | score                       ; Section 6.7.10 
                              | vendor-specific-results     ; Section 6.7.11 
         
         
     6.7.1. Decision 
         
        This is not a header field, but part of the verification results. It 
        is returned in a VERIFICATION-COMPLETE event.  Its value indicates 
        the decision as determined by verification.  It can have the values 
        of accepted, rejected or undecided. 
         
          decision-string = "accepted" | "rejected" | "undecided" 
          decision        = "<decision>" decision-string "</decision>" CRLF 
         
     6.7.2. Num-Frames 
         
        This is not a header field, but part of the verification results. It 
        is returned in a VERIFICATION-COMPLETE event.  Its value indicates 
        the number of 10 millisecond speech frames in the last utterance or 
        in the cumulated set of utterances. 
         
          num-frames          = "<num-frames>" 1*DIGIT "</num-frames>" CRLF 
         
     6.7.3. Device 
         
        This is not a header field, but part of the verification results. It 
        is returned in a RECOGNITION-COMPLETE event.  Its value indicates 
        the apparent type of device used by the caller as determined by 
        verification.  It can have the values of cellular-phone, electret-
        phone, carbon-button-phone and unknown. 
         
          device-string = "cellular-phone" | "electret-phone"  
                          | "carbon-button-phone" | "unknown" 
          device        = "<device>" device-string "</device>" CRLF 
         
     6.7.4. Gender 
         
        This is not a header field, but part of the verification results. It 
        is returned in a VERIFICATION-COMPLETE event.  Its value indicates 
        the apparent gender of the speaker as determined by verification. It 
        can have the values of male, female or unknown. 
         
          gender-string = "male" | "female" | "unknown"  
          gender        = "<gender>" gender-string "</gender>" CRLF 
         
     6.7.5. Matched 
         
        This is not a header field, but part of the verification results. It 
        is returned in a VERIFICATION-COMPLETE event.  When verification is 
        trying to confirm the voiceprint, this indicates if the last 
      
     Burnett, et al.               IETF-Draft                       Page 36 
                                MRCP Extensions               October 2003 

        utterance and the voiceprints are of the same gender and used the 
        same type of device.  It is not returned during verification 
        training. The value can be TRUE or FALSE. 
         
          matched              = "<matched>" Boolean-value "</matched>" CRLF 
         
     6.7.6. Adapted 
         
        This is not a header field, but part of the verification results. It 
        is returned in a VERIFICATION-COMPLETE event.  When verification is 
        trying to confirm the voiceprint, this indicates if the voiceprint 
        has been adapted as a consequence of analyzing the source 
        utterances.  It is not returned during verification training. The 
        value can be TRUE or FALSE. 
         
          adapted              = "<adapted>" Boolean-value "</adapted>" CRLF 
         
     6.7.7. Verification-Score 
         
        This is not a header field, but part of the verification results. It 
        is returned in a VERIFICATION-COMPLETE event.  Its value indicates 
        the score of the last utterance as determined by verification.   
         
        During verification, the higher the score the more likely it is that 
        the speaker is the same one as the one who spoke the voiceprint 
        utterances.  During training, the higher the score the more likely 
        the speaker is to have spoken all of the analyzed utterances.  If 
        there are no such utterances the score is -100.   
         
          verification-score   = "<verification-score>" FLOAT 
                                 "</verification-score>" CRLF 
         
     6.7.8. Group-Name 
         
        This is not a header field, but part of the verification results. It 
        is returned in a VERIFICATION-COMPLETE event.  Its value indicates 
        the name of the group used in speaker identification. 
         
          group-name           = "<group-name>" 1*ALPHA "</group-name>" CRLF 
         
     6.7.9. Member 
         
        This is not a header field, but part of the verification results. It 
        is returned in a VERIFICATION-COMPLETE event.  Its value indicates 
        the member in a group identified by its URI.  There is one URI for 
        each member in the group. 
         
          member              = "<member>" 1*ALPHA "</member>" CRLF 
         
     6.7.10. Score 
         
        This is not a header field, but part of the verification results. It 
        is returned in a VERIFICATION-COMPLETE event.  This is the score 

      
     Burnett, et al.               IETF-Draft                       Page 37 
                                MRCP Extensions               October 2003 

        associated with the identified member of the group, as returned in 
        the member result. 
         
          score               = "<score>" 1*ALPHA "</score>" CRLF 
      
     6.7.11. Vendor-Specific-Results 
      
        This section describes the method used to describe vendor specific 
        results using the xml syntax. Vendor-specific elements and 
        attributes MUST belong to the vendor’s own namespace.  In the result 
        structure, they must either be prefixed by a namespace prefix 
        declared within the result or must be children of an element 
        identified as belonging to the vendor’s namespace.  For details on 
        how to use XML Namespaces, see [6].  Section 2 of [6] provides 
        details on how to declare namespaces and namespace prefixes. Here is 
        an example: 
         
               <?xml version="1.0"?> 
               <result grammar="What-Grammar-URI" 
                       xmlns:xmpl="http://www.example.org/2003/12/mrcp1ext"> 
               <extensions> 
                  <result-type type="VERIFICATION" /> 
                  <verification-result> 
                    <incremental> 
                         <num-frames> 50 </num-frames> 
                         <device> cellular-phone </device> 
                         <gender> female </gender> 
                         <decision> rejected </decision> 
                         <verification-score> -50 </verification-score> 
                         <xmpl:raspiness> high </xmpl:raspiness> 
                         <xmpl:emotion> sadness </xmpl:emotion> 
                    </incremental> 
                    <cumulative> 
                         <num-frames> 50 </num-frames> 
                         <device> cellular-phone </device> 
                         <gender> female </gender> 
                         <decision> rejected </decision> 
                         <verification-score> -50 </verification-score> 
                    </cumulative> 
                  </verification-result> 
               </extensions> 
               </result> 
         
      
      
     6.8. Verification Session Methods 
      
        These methods allow the client to control the mode and target of 
        verification or identification operations within the context of a 
        session. All the verification input cycles that occur within a 
        session may be used to create, update, or validate against the 
        voiceprint specified during the session. At the beginning of each 
        session the verification resource is reset to a known state. 
         
      
     Burnett, et al.               IETF-Draft                       Page 38 
                                MRCP Extensions               October 2003 

        Verification/identification operations can be executed against live 
        or buffered audio. The verification resource provides methods for 
        for collecting and evaluating live audio data, and methods for 
        controlling the verification resource and adjusting its configured 
        behavior. 
         
        There are no specific methods for collecting buffered audio data.  
        This is accomplished by calling RECOGNIZE or RECORD with the header 
        ver-buffer-utterance.  Then, when the method VER-FROM-BUFFER is 
        called verification is performed using the set of buffered audio. 
         
          Buffered-audio-method  -  "VER-FROM-BUFFER" 
         
        The following methods provide controls for verification of live 
        audio utterances : 
         
          live-audio-method  =  "VERIFY" 
                             |  "VER-START-TIMERS" 
         
        The following methods provide controls for configuring the 
        verification resource and for establishing resource states : 
         
          live-or-buffered-audio-method  =  "VER-START-SESSION" 
                                         |  "VER-END-SESSION" 
                                         |  "VER-SET-VOICEPRINT" 
                                         |  "VER-DELETE-VOICEPRINT" 
                                         |  "VER-ROLLBACK" 
                                         |  "VER-STOP" 
                                         |  "SET-PARAMS" 
                                         |  "GET-PARAMS" 
         
         
     6.8.1. VER-START-SESSION 
         
        The VER-START-SESSION method starts a Speaker 
        Verification/Identification Session.  Execution of this method 
        forces the verification resource into a known initial state. If this 
        method is called during an ongoing verification session, the 
        previous session is implicitly aborted.  
         
        Upon completion of the VER-START-SESSION method, the verification 
        resource MUST terminate any ongoing verification sessions, and clear 
        any voiceprint designation.  
         
        The header field "Adapt-Model" may also be present in the start 
        session method to indicate whether or not to adapt a voiceprint with 
        data collected during the session (if the voiceprint verification 
        phase succeeds). By default the voiceprint model should NOT be 
        adapted with data from a verification session. 
      
        Before a verification/identification resource is started, only VER-
        ROLLBACK and generic SET-PARAMS and GET-PARAMS operations can be 
        performed. The media server should return 402(Method not valid in 

      
     Burnett, et al.               IETF-Draft                       Page 39 
                                MRCP Extensions               October 2003 

        this state) for all other operations, such as VERIFY, VER-SET-
        VOICEPRINT. 
      
        A single session can be active at one time. 
         
     Example: 
        C->S:  VER-START-SESSION 314161 MRCP/1.0   
               Adapt-Model: true 
         
        S->C:  MRCP/1.0 314161 200 COMPLETE 
           
     6.8.2. VER-END-SESSION 
         
        The VER-END-SESSION method terminates an ongoing verification 
        session and releases the verification voiceprint model in one of 
        three ways: 
        a. aborting - the voiceprint adaptation or creation may be aborted 
           so that the voiceprint remains unchanged (or is not created). 
        b. committing - when terminating a voiceprint training session, the 
           new voiceprint is committed to the repository. 
        c. adapting - an existing voiceprint is modified using a successful 
           verification. 
         
        The header field "Abort-Model" may be included in the VER-END-
        SESSION to control whether or not to abort any pending changes to 
        the voiceprint. The default behavior is to commit (not abort) any 
        pending changes to the designated voiceprint. 
         
        The VER-END-SESSION method may be safely executed multiple times 
        without first executing the VER-START-SESSION method. Any additional 
        executions of this method without an intervening use of the VER-
        START-SESSION method have no effect on the system. 
         
         
     Example: 
        This example assumes there are a training session or a verification 
        session in progress. 
         
        C->S:  VER-END-SESSION 314174 MRCP/1.0 
               Abort-Model: true 
           
        S->C:  MRCP/1.0 314174 200 COMPLETE 
         
     6.8.3. VER-SET-VOICEPRINT 
         
       The VER-SET-VOICEPRINT method causes the verification resource to 
       establish the voiceprint to be used for verification, identification, 
       or training purposes. At this time the desired mode of the 
       verification resource is not yet known.  
      
     The VER-SET-VOICEPRINT method can also be used to query whether or not 
     a voiceprint exists. The response to the VER-SET-VOICEPRINT method 
     request will contain an indication of the status of the designated 

      
     Burnett, et al.               IETF-Draft                       Page 40 
                                MRCP Extensions               October 2003 

     voiceprint in the "Voiceprint-Exists" header field, allowing the client 
     to determine whether to use the current voiceprint for verification, 
     train a new voiceprint, or choose a different voiceprint. 
      
     A Voiceprint location may be completely specified by providing the URI 
     of the voiceprint repository along with attributes to locate a single 
     voiceprint within the repository. The voiceprint repository is 
     specified through the "Voiceprint-URI" header field, in which a URI 
     describing the location of the voiceprint repository is given. The 
     attributes used to locate a specific record or records within the 
     repository depend on whether the client intends to use speaker 
     verification or speaker identification. 
      
     In the case of speaker verification, only a single attribute is 
     required to uniquely locate a voiceprint record within the repository. 
     The "Voiceprint-Identity" header field MUST describe a unique 
     voiceprint record within a given repository. 
      
     In the case of speaker identification, an attribute describing the set 
     or group of speakers from which to select a specific identity must be 
     supplied in the VER-SET-VOICEPRINT message. The header field 
     "Voiceprint-Group" specifies the group of voiceprints from which an 
     identity is determined. If a new voiceprint is to be added to an 
     existing voiceprint group, then both the voiceprint group and the new 
     voiceprint identifier must be supplied. 
      
     In most cases, the voiceprint operations, VER-SET-VOICEPRINT and VER-
     DELETE-VOICEPRINT, would operate on the same voiceprint repository, but 
     using different voiceprint records or group names. For simplicity 
     reasons, the ’Voiceprint-URI’ header field can be omitted if it’s 
     already set by previous voiceprint operations. But VER-START-SESSION 
     would clear any voiceprint designation, including the ’Voiceprint-URI’.  
      
     Unlike the ’Voiceprint-URI’, the ’Voiceprint-Identifier’ header field 
     MUST be specified in every voiceprint operations. And the ’Voiceprint-
     Group’ header field MUST be specified in every voiceprint operations 
     for identification. 
         
     Example1: 
        This example assumes a verification session is in progress and the 
        voiceprint exists in the voiceprint repository. 
         
        C->S:  VER-SET-VOICEPRINT 314168 MRCP/1.0 
               Voiceprint-URI: <voiceprint uri> 
               Voiceprint-Identifier: <unique string> 
      
        S->C:  MRCP/1.0 314168 200 COMPLETE 
               Voiceprint-URI: <voiceprint uri> 
               Voiceprint-Identifier: <unique string> 
               Voiceprint-Exists: true 
                
     Example2: 
        This example assumes a verification session is in progress and the 
        voiceprint doesn’t exist in the voiceprint repository. 
      
     Burnett, et al.               IETF-Draft                       Page 41 
                                MRCP Extensions               October 2003 

         
        C->S:  VER-SET-VOICEPRINT 314168 MRCP/1.0 
               Voiceprint-URI: <voiceprint uri> 
               Voiceprint-Identifier: <unique string> 
      
        S->C:  MRCP/1.0 314168 200 COMPLETE 
               Voiceprint-URI: <voiceprint uri> 
               Voiceprint-Identifier: <unique string> 
               Voiceprint-Exists: false 
                
     Example3: 
        This example assumes a verification session is in progress and the 
        ’Voiceprint-URI’ header field is a bad URI. 
         
        C->S:  VER-SET-VOICEPRINT 314168 MRCP/1.0 
               Voiceprint-URI: <bad voiceprint uri> 
               Voiceprint-Identifier: <unique string> 
      
        S->C:  MRCP/1.0 314168 405 COMPLETE 
               Voiceprint-URI: <voiceprint uri> 
               Voiceprint-Identifier: <unique string> 
               Completion-Cause: 006 voiceprint-uri-failure 
         
     Example 4: 
        This example assumes an identification session is in progress and 
        the group doesn’t exist in the voiceprint repository. 
         
        C->S:  VER-SET-VOICEPRINT 314168 MRCP/1.0 
               Voiceprint-URI: <voiceprint uri> 
               Voiceprint-Group: <unique string> 
      
        S->C:  MRCP/1.0 314168 200 COMPLETE 
               Voiceprint-URI: <voiceprint uri> 
               Voiceprint-Group: <unique string> 
               Completion-Cause: 010 voiceprint-group-not-exist 
                
     6.8.4. VER-DELETE-VOICEPRINT 
         
        The VER-DELETE-VOICEPRINT method removes a voiceprint from a 
        repository or speaker identification group. For removal of a speaker 
        identification voiceprint, three attributes describing the 
        voiceprint repository, group, and voiceprint identifier are 
        required. For removal of a speaker verification voiceprint, two 
        attributes describing the repository and the specific voiceprint are 
        needed. 
         
        If a single voiceprint record is specified with no group identifier 
        information, the voiceprint record is deleted.  
         
        If a group identifier is specified but no specific voiceprint within 
        the group, the group record is deleted, and all the voiceprints 
        associated with that group are deleted.  
         

      
     Burnett, et al.               IETF-Draft                       Page 42 
                                MRCP Extensions               October 2003 

        If both a voiceprint record and a group identifier are specified, 
        that voiceprint is deleted, and the group identifier is updated to 
        no longer reference that voiceprint. If, after removing the 
        reference to that voiceprint, the group identifier is empty, the 
        group record is also removed.  
         
        If a voiceprint record or a voiceprint group doesn’t exist, the VER-
        DELETE-VOICEPRINT method can silently ignore the message and still 
        return 200 status code. 
      
     Example: 
        This example demonstrates a message to remove a specific voiceprint. 
         
        C->S:  VER-DELETE-VOICEPRINT 314168 MRCP/1.0 
               Voiceprint-URI: <voiceprint uri> 
               Voiceprint-Identifier: <unique string> 
      
        S->C:  MRCP/1.0 314168 200 COMPLETE 
         
     6.8.5. VERIFY 
         
        The VERIFY method is used to send the utterance’s audio stream to 
        the verification resource, which will then process it according to 
        the current Verification-Mode, either to train the voiceprint or 
        verify the user. 
         
        When both a recognizer and verification resource share the same 
        session, the VERIFY method MUST be called prior to calling the 
        RECOGNIZE method on the recognizer resource.  In such cases, media 
        server vendors will know that verification must be enabled for a 
        subsequent call to RECOGNIZE.  
         
     Example: 
        C->S:  VERIFY 543260 MRCP/1.0 
         
        S->C:  MRCP/1.0 543260 200 IN-PROGRESS 
         
        When the VERIFY request is done, the MRCP server should send a 
        ’VERIFICATION-COMPLETE’ event to the client. 
         
      
     6.8.6. VER-FROM-BUFFER 
         
        The VER-FROM-BUFFER method begins an ongoing evaluation of the 
        currently buffered audio against the voiceprint established through 
        the VER-SET-VOICEPRINT method. Execution of this method without 
        first establishing the voiceprint repository and identifier 
        attributes produces an error response. Since a verification session 
        may only have a single voiceprint identity at any given time, this 
        method may not be started repeatedly without first receiving a 
        completion response or sending a VER-STOP message. 
         
        Embedded with the request for audio evaluation is a header field to 
        describe the desired usage of the verification resource. The value 
      
     Burnett, et al.               IETF-Draft                       Page 43 
                                MRCP Extensions               October 2003 

        of the "Verification-Mode" header field MUST be one of either 
        "train" or "verify". 
         
        The buffered audio is not consumed by this evaluation operation and 
        thus VER-FROM-BUFFER may be called repeatedly using different 
        voiceprints. Such usage is desirable to implement an n-best 
        processing strategy to determine a voiceprint identity. 
         
        The processing initiated under a VER-FROM-BUFFER method may be 
        terminated using the VER-STOP method. 
         
        For VER-FROM-BUFFER method, the media server can optionally return 
        an "IN-PROGRESS" response followed by the "VERIFICATION-COMPLETE" 
        event. 
         
     Example: 
        This example illustrates the usage of some buffering methods. In 
        this scenario the client first performed a live verification, but 
        the utterance is rejected. In the meantime, the utterance is also 
        saved to the audio buffer. Then, another voiceprint is used to do 
        verification against the audio buffer and the utterance is accepted. 
        Here, we assume both ’num-min-verification-phrases’ and ’num-max-
        verification-phrases’ are 1. 
      
        C->S:  VER-START-SESSION 314161 MRCP/1.0   
               Adapt-Model: true 
         
        S->C:  MRCP/1.0 314161 200 COMPLETE 
      
        C->S:  VER-SET-VOICEPRINT 314162 MRCP/1.0 
               Voiceprint-URI: <voiceprint uri> 
               Voiceprint-Identifier: <unique string> 
      
        S->C:  MRCP/1.0 314162 200 COMPLETE 
               Voiceprint-URI: <voiceprint uri> 
               Voiceprint-Identifier: <unique string> 
               Voiceprint-Exists: true 
                
         
        C->S:  VERIFY 314164 MRCP/1.0 
               Ver-buffer-utterance: true 
         
        S->C:  MRCP/1.0 314164 200 IN-PROGRESS 
         
        S->C:  VERIFICATION-COMPLETE 314164 COMPLETE MRCP/1.0 
               Completion-Cause: 000 success 
               Content-Type: application/x-nlsml 
               Content-Length: 123 
         
               <?xml version="1.0"?> 
               <result grammar="What-Grammar-URI"> 
               <extensions> 
                  <result-type type="VERIFICATION" /> 
                  <verification-result> 
      
     Burnett, et al.               IETF-Draft                       Page 44 
                                MRCP Extensions               October 2003 

                    <incremental> 
                         <num-frames> 50 </num-frames> 
                         <device> cellular-phone </device> 
                         <gender> female </gender> 
                         <decision> rejected </decision> 
                         <verification-score> -50 </verification-score> 
                    </incremental> 
                    <cumulative> 
                         <num-frames> 50 </num-frames> 
                         <device> cellular-phone </device> 
                         <gender> female </gender> 
                         <decision> rejected </decision> 
                         <verification-score> -50 </verification-score> 
                    </cumulative> 
                  </verification-result> 
               </extensions> 
               </result> 
                
        C->S:  VER-SET-VOICEPRINT 314165 MRCP/1.0 
               Voiceprint-Identifier: <unique string2> 
                
        S->C:  MRCP/1.0 314165 200 COMPLETE 
               Voiceprint-URI: <voiceprint uri> 
               Voiceprint-Identifier: <unique string2> 
               Voiceprint-Exists: true 
                
        C->S:  VER-FROM-BUFFER 314166 MRCP/1.0     
               Verification-Mode: verify 
      
        S->C:  MRCP/1.0 314166 200 IN-PROGRESS 
         
        S->C:  VERIFICATION-COMPLETE 314166 COMPLETE MRCP/1.0 
               Completion-Cause: 000 success 
               Content-Type: application/x-nlsml 
               Content-Length: 123 
         
               <?xml version="1.0"?> 
               <result grammar="What-Grammar-URI"> 
               <extensions> 
                  <result-type type="VERIFICATION" /> 
                  <verification-result> 
                    <incremental> 
                         <num-frames> 50 </num-frames> 
                         <device> cellular-phone </device> 
                         <gender> female </gender> 
                         <decision> accepted </decision> 
                         <verification-score> 50 </verification-score> 
                    </incremental> 
                    <cumulative> 
                         <num-frames> 50 </num-frames> 
                         <device> cellular-phone </device> 
                         <gender> female </gender> 
                         <decision> accepted </decision> 
                         <verification-score> 50 </verification-score> 
      
     Burnett, et al.               IETF-Draft                       Page 45 
                                MRCP Extensions               October 2003 

                    </cumulative> 
                  </verification-result> 
               </extensions> 
               </result> 
      
         
        C->S:  VER-END-SESSION 314168 MRCP/1.0     
      
        S->C:  MRCP/1.0 314168 200 COMPLETE 
         
     6.8.7. VER-ROLLBACK 
         
        The VER-ROLLBACK method discards the last buffered utterance or 
        discards the last live utterances (when the mode is "train" or 
        "verify"). This method should be invoked when the caller provides 
        undesirable input such as non-speech noises, side-speech, out-of-
        grammar utterances, commands, etc. Note that this method does not 
        provide a stack of rollback states. Executing VER-ROLLBACK twice in 
        succession without an intervening recognition operation has no 
        effect on the second attempt. 
         
     Example: 
        C->S:  VER-ROLLBACK 314165 MRCP/1.0   
      
        S->C:  MRCP/1.0 314165 200 COMPLETE 
           
     6.8.8. VER-STOP 
         
        The VER-STOP method from the client to the server tells the 
        verification resource to stop VERIFY or VER-FROM-BUFFER requests if 
        one is active. If such a request is active and the STOP request 
        successfully terminated it, then the response header contains an 
        active-request-id-list header field containing the request-id of the 
        VERIFY or VER-FROM-BUFFER request that was terminated. In this case, 
        no VERIFICATION-COMPLETE event will be sent for the terminated 
        request. If there was no verify request active, then the response 
        MUST NOT contain an active-request-id-list header field. Either way 
        the response MUST contain a status of 200(Success).  
         
        The VER-STOP method aborts an ongoing evaluation operation against 
        live audio or buffered audio. 
              
     Example: 
        This example assumes a voiceprint identity has already been 
        established. 
         
        C->S:  VERIFY 314177 MRCP/1.0 
               Verification-Mode: verify 
         
        S->C:  MRCP/1.0 314177 200 IN-PROGRESS  
              
        C->S:  VER-STOP 314178 MRCP/1.0  
         
        S->C:  MRCP/1.0 314178 200 COMPLETE 
      
     Burnett, et al.               IETF-Draft                       Page 46 
                                MRCP Extensions               October 2003 

               Active-Request-Id-List: 314177 
         
     6.8.9. VER-START-TIMERS 
      
        This request is sent from the client to the verification resource to 
        start the no-input timer, usually once the audio prompts to the 
        caller have played to completion.  
         
     Example: 
        C->S:  VER-START-TIMERS 543260 MRCP/1.0 
      
        S->C:  MRCP/1.0 543260 200 COMPLETE 
      
     6.8.10. SET-PARAMS 
      
        The SET-PARAMS method, from the client to the server, tells the 
        verification resource to set and modify its configuration 
        parameters. If the server resource does not recognize an OPTIONAL 
        parameter it MUST  
        ignore that field. Many of the parameters in the SET-PARAMS method  
        can also be used in another method like the VERIFY method. But the 
        difference is that when you set something like the security-level 
        using the SET-PARAMS it applies for all future requests, whenever 
        applicable. On the other hand, when you pass security-level in a 
        VERIFY request it applies only to that request.  
              
     Example:  
        C->S:  SET-PARAMS 543256 MRCP/1.0  
               Security-Level: high 
               No-Input-Timeout: 5000 
              
        S->C:  MRCP/1.0 543256 200 COMPLETE 
         
     6.8.11. GET-PARAMS 
         
        The GET-PARAMS method, from the client to the server, asks the 
        verification resource for its current values for parameters in the 
        request. The client can request specific parameters from the server 
        by sending it one or more empty parameter headers with no values. 
        The server should then return the settings for those specific 
        parameters only. When the client does not send a specific list of 
        empty parameter headers, the verification resource should return the 
        settings for all parameters. The wild card use can be very intensive 
        as the number of settable parameters can be large depending on the 
        vendor.  Hence it is RECOMMENDED that the client does not use the 
        wildcard GET-PARAMS operation very often.  
              
     Example:  
        C->S:  GET-PARAMS 543256 MRCP/1.0  
               Security-Level: 
               No-Input-Timeout: 
              
        S->C:  MRCP/1.0 543256 200 COMPLETE  
               Security-Level: high 
      
     Burnett, et al.               IETF-Draft                       Page 47 
                                MRCP Extensions               October 2003 

               No-Input-Timeout: 5000 
         
         
     6.9. Verification Session Events 
         
     6.9.1. VERIFICATION-COMPLETE 
         
        The VERIFICATION-COMPLETE event follows a call to VERIFY or VER-
        FROM-BUFFER and is used to communicate to the client the 
        verification results.  This event will contain only verification 
        results. 
         
        Example: 
        S->C:  VERIFICATION-COMPLETE 543259 COMPLETE MRCP/1.0 
               Completion-Cause: 000 success 
               Content-Type: application/x-nlsml 
               Content-Length: 123 
         
               <?xml version="1.0"?> 
               <result grammar="What-Grammar-URI"> 
               <extensions> 
                  <result-type type="VERIFICATION" /> 
                  <verification-result> 
                    <incremental> 
                         <num-frames> 50 </num-frames> 
                         <device> cellular-phone </device> 
                         <gender> female </gender> 
                         <decision> accepted </decision> 
                         <verification-score> 50 </verification-score> 
                    </incremental> 
                    <cumulative> 
                         <num-frames> 150 </num-frames> 
                         <device> cellular-phone </device> 
                         <gender> female </gender> 
                         <decision> accepted </decision> 
                         <verification-score> 25 </verification-score> 
                    </cumulative> 
                  </verification-result> 
                  <identification-result> 
                    <group-name> 123456 </group-name> 
                    <member> Martha-smith </members> 
                    <score> 75 </scores> 
                  </identification-result> 
               </extensions> 
               </result> 
         
     6.9.2. START-OF-SPEECH 
         
        The START-OF-SPEECH event is returned from the server to the client 
        once the server has detected speech.  This event is always returned 
        by the verification resource when speech has been detected, 
        irrespective of the fact that both the recognizer and verification 
        resource are sharing the same session or not. 
         
      
     Burnett, et al.               IETF-Draft                       Page 48 
                                MRCP Extensions               October 2003 

         
        Example: 
        S->C:  START-OF-SPEECH 543259 IN-PROGRESS MRCP/1.0 
         


















































      
     Burnett, et al.               IETF-Draft                       Page 49 
                                MRCP Extensions               October 2003 

     7.   Hotword Recognition 
         
        This document captures the extensions required to implement Voice 
        Enrollment, Speaker Verification and Hotword recognition using MRCP.  
        This section describes the methods, responses and events needed for 
        doing Hotword recognition.   
         
        A new type of Speech Recognizer resource is presented that can be 
        used for Hotword recognition.  Unlike the primary recognizer 
        resource, which is driven by the client for each recognition 
        request, the secondary Hotword recognition resource is attached to 
        the session and listens continuously until a particular command 
        phrase is spoken. 
      
        The Hotword recognition resource can be the only recognition 
        resource in a session or it can be attached to the same session as a 
        primary recognizer resource, and consequently connected to the same 
        audio stream.  When a client sends a SETUP request to add a Hotword 
        recognizer resource to an existing session, then the MRCP server 
        attaches the Hotword recognition resource in eavesdropping mode on 
        the RTP stream already established by the primary resource. 
         
         
     7.1. Hotword State Machine  
         
        The difference between a Hotword recognition resource and the 
        primary recognition resource is minor.  The RECOGNIZE and STOP 
        methods are the only methods allowed on a Hotword recognition 
        resource.  The only event generated is RECOGNITION-COMPLETE.  The 
        resource goes from IDLE to RECOGNIZING and back to IDLE just like a 
        regular recognizer resource. 
         
        A Hotword recognition resource, unlike a normal recognizer resource, 
        will not send a START-OF-SPEECH event while it is trying to locate a 
        Hotword.  The first event that will be returned once the Hotword is 
        detected is a RECOGNITION-COMPLETE event. 
      
        After a RECOGNITION-COMPLETE event is reported, the Hotword 
        recognition resource must be primed once again by sending another 
        RECOGNIZE request. 
         
        The Hotword recognition resource can also be stopped by calling the 
        STOP method. 
     7.1.1. Addressing Resources  
         
        To request a Hotword recognition resource be added to a session, a 
        different URI must be specified in the SETUP message.  The same 
        rules apply as for other resources.  That is, if no session is 
        specified in the SETUP message, then this is considered to be the 
        first resource added to a session.  For subsequent SETUP requests, 
        the MRCP client should indicate to the server that these resources 
        belong to the same session by returning the same session id in the 
        SETUP request message.   

      
     Burnett, et al.               IETF-Draft                       Page 50 
                                MRCP Extensions               October 2003 

         
        There is no special order required when requesting synthesizer, 
        recognizer or Hotword-recognizer resources. 
      
         
     7.2. Hotword Header Fields 
         
        Hotword recognition requests may contain the following header 
        fields. 
         
          Hotword-header  =     Hotword-Max-Duration         ; Section 7.2.1 
                              | Hotword-Min-Duration         ; Section 7.2.2 
         
         
     7.2.1. Hotword-Max-Duration 
         
        This parameter MAY BE sent in a RECOGNIZE request to enable Hotword 
        listening.  It specifies the maximum length of an utterance that 
        should be considered for Hotword.  This parameter, along with 
        Hotword-Min-Duration, can be used to tune performance by preventing 
        the recognizer from evaluating utterances that are too short or too 
        long to be the Hotword.  The value of this field is in milliseconds. 
        The default is 1700 milliseconds. 
         
          hotword-max-duration     = "Hotword-Max-Duration" ":" 1*DIGIT CRLF 
         
         
     7.2.2. Hotword-Min-Duration 
         
        This parameter MAY BE sent in a RECOGNIZE request to enable Hotword 
        listening.  It specifies the minimum length of an utterance that can 
        be considered for Hotword.  This parameter, along with Hotword-Max-
        Duration, can be used to tune performance by preventing the 
        recognizer from evaluating utterances that are too short or too long 
        to be the hot word.  The value of this field is in milliseconds. The 
        default is 300 milliseconds. 
         
          hotword-min-duration     = "Hotword-Min-Duration" ":" 1*DIGIT CRLF 
         
         
     7.3. Hotword Methods 
      
     7.3.1. SETUP 
         
        The SETUP method from the client to the server is used to attach a 
        Hotword recognizer resource to the session. 
         
        Example: 
        C->S:  ANNOUNCE rtsp://media.server.com/media/hotword-asr RTSP/1.0 
               CSeq: 3  
               Transport: RTP/AVP;unicast;client_port=8000-8001; mode=record 
               Session: 12345678  
                  
        S->C:  RTSP/1.0 200 OK 
      
     Burnett, et al.               IETF-Draft                       Page 51 
                                MRCP Extensions               October 2003 

               CSeq: 3  
               Transport: RTP/AVP;unicast;client_port=8000-8001;  
                                 server_port=9000-9001;mode=record  
               Session: 12345678 
      
      
         
     7.3.2. RECOGNIZE 
         
        The RECOGNIZE method from the client to the server starts an ongoing 
        Hotword recognition.  This operation can be stopped using the STOP 
        method.  Otherwise, the RECOGNITION-COMPLETE event will be returned 
        when the Hotword has been recognized.   
         
        The client must call RECOGNIZE once again to re-start Hotword 
        recognition. 
         
        Example: 
        C->S:  ANNOUNCE rtsp://media.server.com/media/hotword-asr RTSP/1.0 
               Cseq: 314 
               Session: 12345678 
               Content-Type: application/mrcp 
               Content-Length: 276 
         
               RECOGNIZE 543259 MRCP/1.0 
               Content-Type: application/grammar+xml 
               Content-Length: 123 
                
               Hotword-Min-Duration: 0.3 
               Hotword-Max-Duration: 1.7 
               <hotword grammar> 
                
        S->C:  RTSP/1.0 200 OK 
               Cseq: 314 
               Content-Type: application/mrcp 
               Content-Length: 67 
         
               MRCP/1.0 543259 200 IN-PROGRESS 
         
        S->C:  ANNOUNCE rtsp://media.server.com/media/hotword-asr RTSP/1.0 
               Cseq: 315 
               Session: 12345678 
               Content-Type: application/mrcp 
               Content-Length: 123 
         
               RECOGNITION-COMPLETE 543259 200 MRCP/1.0 
               Completion-Cause: 000 Normal 
               Content-Type: application/x-nlsml 
               Content-Length: 76 
         
               <?xml version="1.0"?> 
               <result grammar="hotword-grammar"> 
                    <interpretation confidence="75"> 
                         <instance> 
      
     Burnett, et al.               IETF-Draft                       Page 52 
                                MRCP Extensions               October 2003 

                              <command confidence="75"> Wakeup </command> 
                         </instance> 
                    </interpretation> 
                    <input mode="speech" confidence="75"> Wakeup </input> 
                    <extensions>  
                    </extensions> 
               </result> 
         
        C->S:  RTSP/1.0 200 OK 
               Cseq: 315 












































      
     Burnett, et al.               IETF-Draft                       Page 53 
                                MRCP Extensions               October 2003 

     8.   RTSP based Examples:   
      
        This section contains examples of typical sessions between a client 
        and the server.   
         
     8.1. Enrollment 
         
        This example illustrates a typical enrollment session.  
      
        First, you need to start an enrollment session before proceeding to 
        learn new phrases.   
        C->S:  ANNOUNCE rtsp://media.server.com/media/recognizer RTSP/1.0 
               Cseq: 406 
               Session: 12345678 
               Content-Type: application/mrcp 
               Content-Length: 123 
         
               START-PHRASE-ENROLLMENT 543258 MRCP/1.0 
               Num-Min-Consistent-Pronunciations: 2 
               Consistency-Threshold: 3000 
               Clash-Threshold: 1200 
               Personal-Grammar-URI: <personal grammar uri> 
               Phrase-Id: <phrase id> 
               Phrase-NL: <NL phrase> 
               Weight: 1 
               Save-Best-Waveform: true 
         
        S->C:  RTSP/1.0 200 OK 
               Cseq: 406 
               Content-Type: application/mrcp 
               Content-Length: 86 
         
               MRCP/1.0 543258 200 COMPLETE 
         
        Then, the application can proceed to enroll an utterance by 
        iterating over the following command.  
        C->S:  ANNOUNCE rtsp://media.server.com/media/recognizer RTSP/1.0 
               Cseq: 407 
               Session: 12345678 
               Content-Type: application/mrcp 
               Content-Length: 276 
         
               RECOGNIZE 543259 MRCP/1.0 
               Content-Type: application/grammar+xml 
               Content-Length: 123 
         
               <?xml version="1.0"?>  
         
               <!-- the default grammar language is US English -->  
               <grammar xml:lang="en-US" version="1.0">  
                
               <!-- example dial command: Please call Jeff -->  
               <rule id="dialCommand" scope=public>  
                    <one-of> 
      
     Burnett, et al.               IETF-Draft                       Page 54 
                                MRCP Extensions               October 2003 

                         <item> help </item> 
                         <item> cancel </item> 
                    </one-of> 
               </rule> 
                
        S->C:  RTSP/1.0 200 OK 
               Cseq: 407 
               Content-Type: application/mrcp 
               Content-Length: 67 
         
               MRCP/1.0 543259 200 IN-PROGRESS 
         
        S->C:  ANNOUNCE rtsp://media.server.com/media/recognizer RTSP/1.0 
               Cseq: 408 
               Session: 12345678 
               Content-Type: application/mrcp 
               Content-Length: 87 
         
               START-OF-SPEECH 543259 200 MRCP/1.0 
         
        C->S:  RTSP/1.0 200 OK 
               Cseq: 408 
         
        The recognizer resource returns the enrollment status after each 
        attempt to enroll an utterance.  This repeats until the required 
        number of pronunciations is consistent and that there are no clashes 
        with other pronunciations in the personal grammar. 
        S->C:  ANNOUNCE rtsp://media.server.com/media/recognizer RTSP/1.0 
               Cseq: 409 
               Session: 12345678 
               Content-Type: application/mrcp 
               Content-Length: 276 
         
               RECOGNITION-COMPLETE 543259 200 MRCP/1.0 
               Completion-Cause: 000 Normal 
               Content-Type: application/x-nlsml 
               Content-Length: 123 
         
               <?xml version="1.0"?> 
               <result grammar="Personal-Grammar-URI"> 
               <extensions> 
                  <result-type type="ENROLLMENT" /> 
                  <enrollment-result> 
                    <num-clashes> 2 </num-clashes> 
                    <num-good-repetitions> 1 </num-good-repetitions> 
                    <num-repetitions-still-needed> 1 </num-repetitions-
        still-needed> 
                    <consistency-status> consistent </consistency-status> 
                    <clash-phrase-ids>  
                         <item> Jeff </item> <item> Andre </item>  
                    </clash-phrase-ids> 
                  </enrollment-result> 
               </extensions> 
               </result> 
      
     Burnett, et al.               IETF-Draft                       Page 55 
                                MRCP Extensions               October 2003 

         
        C->S:  RTSP/1.0 200 OK 
               Cseq: 409 
         
        Finally, when the application is satisfied with the enrollment 
        results then the enrollment is committed to the personal grammar by 
        ending the enrollment session, as follows. 
        C->S:  ANNOUNCE rtsp://media.server.com/media/recognizer RTSP/1.0 
               Cseq: 410 
               Session: 12345678 
               Content-Type: application/mrcp 
               Content-Length: 123 
         
               END-PHRASE-ENROLLMENT 543260 MRCP/1.0 
                
         
        S->C:  RTSP/1.0 200 OK 
               Cseq: 410 
               Content-Type: application/mrcp 
               Content-Length: 67 
         
               MRCP/1.0 543260 200 COMPLETE 
               Waveform-URL: <waveform url> 
         
     8.2. Speaker Verification and Identification 
         
        This example illustrates a verification session. Assume prompts are 
        played outside, MRCP synthesizer resource is left out for simplicity 
        reasons.  
         
        Opening the recognizer. This is the first resource for this  
        session. The server and client agree on a single Session ID 12345678  
        and set of RTP/RTCP ports on both sides.  
              
        C->S:SETUP rtsp://media.server.com/media/recognizer RTSP/1.0  
             CSeq: 2  
             Transport:RTP/AVP;unicast;client_port=46456-46457  
             Content-Type: application/sdp  
             Content-Length: 190  
                     
             v=0  
             o=- 123 456 IN IP4 10.0.0.1  
             s=Media Server  
             p=+1-888-555-1212  
             c=IN IP4 0.0.0.0  
             t=0 0  
             m=audio 0 RTP/AVP 0 96  
             a=rtpmap:0 pcmu/8000  
             a=rtpmap:96 telephone-event/8000  
             a=fmtp:96 0-15  
              
        S->C:RTSP/1.0 200 OK  
             CSeq: 2  
             Transport:RTP/AVP;unicast;client_port=46456-46457;  
      
     Burnett, et al.               IETF-Draft                       Page 56 
                                MRCP Extensions               October 2003 

                       server_port=46460-46461  
             Session: 12345678  
             Content-Length: 190  
             Content-Type: application/sdp  
                     
             v=0  
             o=- 3211724219 3211724219 IN IP4 10.3.2.88  
             s=Media Server  
             c=IN IP4 0.0.0.0  
             t=0 0  
             m=audio 46460 RTP/AVP 0 96  
             a=rtpmap:0 pcmu/8000  
             a=rtpmap:96 telephone-event/8000  
             a=fmtp:96 0-15  
              
        Opening a verification resource. Uses the existing session ID and 
        ports.  
              
        C->S:SETUP rtsp://media.server.com/media/verification-resource 
        RTSP/1.0  
             CSeq: 3  
             Transport: RTP/AVP;unicast;client_port=46456-46457;  
                        mode=record;ttl=127  
             Session: 12345678  
           
        S->C:RTSP/1.0 200 OK  
             CSeq: 3  
             Transport: RTP/AVP;unicast;client_port=46456-46457;  
                        server_port=46460-46461;mode=record;ttl=127  
             Session: 12345678 
         
        Start a verification session. 
         
        C->S:ANNOUNCE rtsp://media.server.com/media/verification-resource 
        RTSP/1.0 
             Cseq: 4 
             Session: 12345678 
             Content-Type: application/mrcp 
             Content-Length: 53 
         
             VER-START-SESSION 314161 MRCP/1.0     
             Adapt-Model: true 
         
        S->C:RTSP/1.0 200 OK 
             CSeq: 4 
             Session: 12345678 
             Content-Length: 30 
             Content-Type: application/mrcp 
         
             MRCP/1.0 314161 200 COMPLETE 
         
        Start a recognition request, getting the account number for example. 
         
        C->S:ANNOUNCE rtsp://media.server.com/media/recognizer RTSP/1.0 
      
     Burnett, et al.               IETF-Draft                       Page 57 
                                MRCP Extensions               October 2003 

             CSeq: 6 
             Session: 12345678 
             Content-Type: application/mrcp 
             Content-Length: 188 
         
             RECOGNIZE 314163 MRCP/1.0 
             No-Input-Timeout: 7000 
             Recognizer-Start-Timers: false 
             Save-Waveform: true 
             Ver-Buffer-Utterance: true 
             N-Best-List-Length: 2 
             Content-Type: text/uri-list 
             Content-Length: 33 
         
             builtin:grammar/digits?length=5 
         
        S->C:RTSP/1.0 200 OK 
             CSeq: 6 
             Session: 12345678 
             Content-Length: 33 
             Content-Type: application/mrcp 
         
             MRCP/1.0 314163 200 IN-PROGRESS 
         
        S->C:ANNOUNCE rtsp://media.server.com/media/recognizer RTSP/1.0 
             CSeq: 1 
             Session: 12345678 
             Content-Length: 65 
             Content-Type: application/mrcp 
         
             START-OF-SPEECH 314163 IN-PROGRESS MRCP/1.0 
             Proxy-Sync-Id: 1 
         
        C->S:RTSP/1.0 200 OK 
             CSeq: 1 
         
        The recognition result contains 2 choices. 
         
        S->C:ANNOUNCE rtsp://media.server.com/media/recognizer RTSP/1.0 
             CSeq: 2 
             Session: 12345678 
             Content-Length: 3511 
             Content-Type: application/mrcp 
         
             RECOGNITION-COMPLETE 314163 COMPLETE MRCP/1.0 
             Completion-Cause: 000 success 
             Waveform-URL: http://media.server.com/waveforms/utt01.wav 
             Content-Type: application/x-nlsml 
             Content-Length: 3280 
         
             <?xml version="1.0" encoding="UTF-8"?> 
               <result grammar="builtin:grammar/digits?length=5"> 
                 <interpretation confidence="71"> 
                   <instance>13579</instance> 
      
     Burnett, et al.               IETF-Draft                       Page 58 
                                MRCP Extensions               October 2003 

                   <input mode="speech" confidence="72"> 
                      one three five seven nine 
                   </input> 
                 </interpretation> 
                 <interpretation confidence="69"> 
                    <instance>13479</instance> 
                    <input mode="speech" confidence="72"> 
                       one three four seven nine 
                    </input> 
                 </interpretation> 
               </result> 
         
        C->S:RTSP/1.0 200 OK 
             CSeq: 2 
      
        Check to see if the first choice from nbest list exists in the  
        Voiceprint repository. 
      
        C->S:ANNOUNCE rtsp://media.server.com/media/verification-resource 
        RTSP/1.0 
             CSeq: 7 
             Session: 12345678 
             Content-Type: application/mrcp 
             Content-Length: 119 
         
             VER-SET-VOICEPRINT 314164 MRCP/1.0 
             Voiceprint-URI: http://media.server.com/VoicePrints 
             Voiceprint-Identifier: 13579 
         
        Voiceprint ID 13579 doesn’t exist.  
      
        S->C:RTSP/1.0 200 OK 
             CSeq: 7 
             Session: 12345678 
             Content-Length: 139 
             Content-Type: application/mrcp 
         
             MRCP/1.0 314164 200 COMPLETE 
             Voiceprint-URI: http://media.server.com/VoicePrints 
             Voiceprint-Identifier: 13579 
             Voiceprint-Exists: false 
                
        Check the second choice in the nbest list.  
                
        C->S:ANNOUNCE rtsp://media.server.com/media/verification-resource 
        RTSP/1.0 
             CSeq: 8 
             Session: 12345678 
             Content-Type: application/mrcp 
             Content-Length: 119 
         
             VER-SET-VOICEPRINT 314165 MRCP/1.0 
             Voiceprint-URI: http://media.server.com/VoicePrints 
             Voiceprint-Identifier: 13479 
      
     Burnett, et al.               IETF-Draft                       Page 59 
                                MRCP Extensions               October 2003 

      
        Voiceprint ID 13479 exists.  
         
        S->C:RTSP/1.0 200 OK 
             CSeq: 8 
             Session: 12345678 
             Content-Length: 138 
             Content-Type: application/mrcp 
         
             MRCP/1.0 314165 200 COMPLETE 
             Voiceprint-URI: http://media.server.com/VoicePrints 
             Voiceprint-Identifier: 13479 
             Voiceprint-Exists: true 
         
        Start verify on the voiceprint 13479.  
         
        C->S:ANNOUNCE rtsp://media.server.com/media/verification-resource 
        RTSP/1.0 
             CSeq: 9 
             Session: 12345678 
             Content-Type: application/mrcp 
             Content-Length: 54 
         
             VER-FROM-BUFFER 314166 MRCP/1.0 
             Verify-Mode: verify 
         
        S->C:RTSP/1.0 200 OK 
             CSeq: 9 
             Session: 12345678 
             Content-Length: 33 
             Content-Type: application/mrcp 
         
             MRCP/1.0 314166 200 IN-PROGRESS 
         
        The caller is verified (assume num-min-verification-phrases and num-
        max-verification-phrases are 1). 
         
        S->C:ANNOUNCE rtsp://media.server.com/media/verification-resource 
        RTSP/1.0 
             CSeq: 3 
             Session: 12345678 
             Content-Type: application/mrcp 
             Content-Length: 183 
         
             VERIFICATION-COMPLETE 314166 COMPLETE MRCP/1.0 
             Completion-Cause: 000 success 
             Content-Type: application/x-nlsml 
             Content-Length: 123 
         
             <?xml version="1.0"?> 
               <result> 
               <extensions> 
                  <result-type type="VERIFICATION" /> 
                  <verification-result> 
      
     Burnett, et al.               IETF-Draft                       Page 60 
                                MRCP Extensions               October 2003 

                    <incremental> 
                         <num-frames> 50 </num-frames> 
                         <device> cellular-phone </device> 
                         <gender> female </gender> 
                         <decision> accepted </decision> 
                         <verification-score> 50 </verification-score> 
                    </incremental> 
                    <cumulative> 
                         <num-frames> 50 </num-frames> 
                         <device> cellular-phone </device> 
                         <gender> female </gender> 
                         <decision> accepted </decision> 
                         <verification-score> 50 </verification-score> 
                    </cumulative> 
                  </verification-result> 
               </extensions> 
               </result> 
         
        C->S:RTSP/1.0 200 OK 
             CSeq: 3 
                
        End the verification session. 
         
        C->S:ANNOUNCE rtsp://media.server.com/media/verification-resource 
        RTSP/1.0 
             CSeq: 11 
             Session: 12345678 
             Content-Type: application/mrcp 
             Content-Length: 33 
         
             VER-END-SESSION 314168 MRCP/1.0  
      
        S->C:RTSP/1.0 200 OK 
             CSeq: 11 
             Session: 12345678 
             Content-Length: 30 
             Content-Type: application/mrcp 
         
             MRCP/1.0 314168 200 COMPLETE 
         
        Teardown the recognizer and verification resource. 
         
        C->S:TEARDOWN rtsp://media.server.com/media/verification-resource 
        RTSP/1.0  
             CSeq: 12 
             Session: 12345678  
              
        S->C:RTSP/1.0 200 OK  
             CSeq: 12 
         
        C->S:TEARDOWN rtsp://media.server.com/media/recognizer RTSP/1.0  
             CSeq: 13 
             Session: 12345678  
              
      
     Burnett, et al.               IETF-Draft                       Page 61 
                                MRCP Extensions               October 2003 

        S->C:RTSP/1.0 200 OK  
             CSeq: 13 
         
     8.3. Hotword Recognition 
         
        Will be provided later. 
         
     9.   Security Considerations 
         
        The primary additional security considerations raised by the 
        extensions in this document have to do with the use of speaker 
        identification and verification as security functions.  One such 
        consideration is that individualized voiceprints are used to 
        identify or confirm the identity of a caller.  The privacy and 
        integrity of these voiceprints is of high importance.  Fortunately, 
        voiceprints are not transferred between client and server but are 
        rather maintained by the server using the server’s own security 
        mechanisms. 
        Another consideration particular to these functions is the 
        consequence of manipulating the media (speech) stream.  Some 
        verification technologies in use today are susceptible to 
        impersonation or "replay" attacks, and all are susceptible to a 
        denial of access attack by garbling an otherwise acceptable media 
        stream.  We recommend that standard media-securing protocols such as 
        SRTP be used in these cases. 
         
     10.  Reference Documents 
           
        [1]   Fielding, R., Gettys, J., Mogul, J., Frystyk. H.,  
              Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 
              transfer protocol -- HTTP/1.1", RFC 2616, June 1999.  
      
        [2]   Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time 
              Streaming Protocol (RTSP)", RFC 2326, April 1998 
           
        [3]   Shanmugham, S., et al., "A Media Resource Control Protocol 
              Developed by Cisco, Nuance, and Speechworks.", Internet-draft 
              draft-shanmugham-mrcp-04, (work in progress), May 1, 2003 
         
        [4]   World Wide Web Consortium, "Natural Language Semantics Markup  
              Language (NLSML) for the Speech Interface Framework", W3C  
              Working Draft, 30 May 2001. 
         
        [5]   Bradner, S., "Key words for use in RFCs to Indicate 
              Requirement Levels", RFC 2119, March 1997. 
         
        [6]   T. Bray et al., "Namespaces in XML", W3C Recommendation, 14 
              January 1999. See http://www.w3.org/TR/1999/REC-xml-names-
              19990114. 
         
      
     Acknowledgements 
         

      
     Burnett, et al.               IETF-Draft                       Page 62 
                                MRCP Extensions               October 2003 

        The authors would like to thank the following additional individuals 
        for their contributions to this document: 
         
        Andre Gillet (Nuance Communications) 
        Klaus Reifenrath (Scansoft) 
        Saravanan Shanmugham (Cisco Systems, Inc.) 
         
     Full Copyright Statement 
         
        Copyright (C) The Internet Society (2003).  All Rights Reserved. 
         
        This document and translations of it may be copied and furnished to 
        others, and derivative works that comment on or otherwise explain it 
        or assist in its implementation may be prepared, copied, published 
        and distributed, in whole or in part, without restriction of any 
        kind, provided that the above copyright notice and this paragraph 
        are included on all such copies and derivative works.  However, this 
        document itself may not be modified in any way, such as by removing 
        the copyright notice or references to the Internet Society or other 
        Internet organizations, except as needed for the purpose developing 
        Internet standards in which case the procedures for copyrights 
        defined in the Internet Standards process must be followed, or as 
        required to translate it into languages other than English. 
         
        The limited permissions granted above are perpetual and will not be 
        revoked by the Internet Society or its successors or assigns. 
         
        This document and the information contained herein is provided on an 
        "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET       
        ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, 
        INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 
        INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED       
        WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 
      
     Authors’ Addresses 
         
        Daniel C. Burnett 
        Nuance Communications 
        1005 Hamilton Court 
        Menlo Park, CA 94025-1422 
        USA 
         
        Email:  burnett@nuance.com 
         
         
        Pierre Forgues 
        Nuance Communications Ltd. 
        111 Duke Street 
        Suite 4100 
        Montreal, Quebec 
        Canada H3C 2M1 
         
        Email:  forgues@nuance.com 
         
      
     Burnett, et al.               IETF-Draft                       Page 63 
                                MRCP Extensions               October 2003 

         
        Charles Galles 
        Intervoice, Inc. 
        17811 Waterview Parkway 
        Dallas, Texas 75252 
         
        Email:  charles.galles@intervoice.com 
         
         
         
         
        This document expires on June 24, 2004. 
      









































      
     Burnett, et al.               IETF-Draft                       Page 64