Internet DRAFT - draft-espelien-avt-common

draft-espelien-avt-common



Audio/Video Transport Working Group                         M. Espelien
Internet Draft: RTP Payload Common Format for                R. Gellens
                Vocoder Speech                            Qualcomm Inc.
Document: draft-espelien-avt-common-01.txt                 October 2001


             RTP Payload Common Format for Vocoder Speech 
    
    
Status of this Memo
    
    This document is an Internet-Draft and is in full conformance with
    all provisions of Section 10 of RFC2026.  Internet-Drafts are
    working documents of the Internet Engineering Task Force (IETF), its
    areas, and its working groups.  Note that other groups may also
    distribute working documents as Internet-Drafts.
    
    Internet-Drafts are draft documents valid for a maximum of six
    months and may be updated, replaced, or obsoleted by other documents
    at any time.  It is inappropriate to use Internet-Drafts as
    reference material or to cite them other than as "work in progress."
    
    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/ietf/1id-abstracts.txt.  The list of Internet-
    Draft Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html.
    
    
Copyright Notice
    
     Copyright (C) The Internet Society 2001. All Rights Reserved.


























Espelien & Gellens              Expires April 2002              [Page 1]Internet Draft            Common Payload Format            October 2001

                           Table of Contents
     1  Abstract  . . . . . . . . . . . . . . . . . . . . . . . . . .  2
     2  Conventions Used in this Document  . . . . . . . . . . . . .   3
     3  Changes from Previous Revision  . . . . . . . . . . . . . . .  3
     4  Introduction . . . . . . . . . . . . . . . . . . . . . . . .   3
     5  Background and Motivation for Common Format . . . . . . . . .  3
     6  Common Characteristics . . . . . . . . . . . . . . . . . . .   4
       6.1  PureVoice Characteristics . . . . . . . . . . . . . . . .  5
       6.2  EVRC Characteristics . . . . . . . . . . . . . . . . . .   5
       6.3  SMV Characteristics . . . . . . . . . . . . . . . . . . .  5
     7  Common RTP Packet Format . . . . . . . . . . . . . . . . . .   6
       7.1  Normal Format . . . . . . . . . . . . . . . . . . . . . .  6
       7.2  TOC Entries  . . . . . . . . . . . . . . . . . . . . . .   8
       7.3  Bundling Codec Data Frames  . . . . . . . . . . . . . . .  9
         7.3.1  Additional Bundling Restrictions on the Sender . . .  10
       7.4  Interleaving Codec Data Frames  . . . . . . . . . . . . . 10
         7.4.1  Additional Interleaving Restrictions on the Sender .  11
       7.5  Finding Interleave Group Boundaries . . . . . . . . . . . 11
       7.6  Reconstructing Interleaved Speech  . . . . . . . . . . .  12
       7.7  Receiving Invalid Values  . . . . . . . . . . . . . . . . 12
       7.8  Optimized Single Frame Format  . . . . . . . . . . . . .  13
       7.9  Detecting Which Format  . . . . . . . . . . . . . . . . . 13
       7.10  Codec Data Frame Format . . . . . . . . . . . . . . . .  13
         7.10.1  PureVoice Codec Data Frame Format  . . . . . . . . . 13
         7.10.2  EVRC or SMV Codec Data Frame Format . . . . . . . .  14
       7.11  Adding New Codecs  . . . . . . . . . . . . . . . . . . . 15
     8  Tardy Packets  . . . . . . . . . . . . . . . . . . . . . . .  15
     9  Lost Packets  . . . . . . . . . . . . . . . . . . . . . . . . 16
    10  Implementation Issues  . . . . . . . . . . . . . . . . . . .  16
      10.1  Interleaving Length . . . . . . . . . . . . . . . . . . . 16
    11  Security Considerations  . . . . . . . . . . . . . . . . . .  17
    12  Real Time and Storage Mode  . . . . . . . . . . . . . . . . . 17
      12.1  RTP Mode . . . . . . . . . . . . . . . . . . . . . . . .  17
      12.2  Storage Mode  . . . . . . . . . . . . . . . . . . . . . . 17
    13  IANA Considerations  . . . . . . . . . . . . . . . . . . . .  18
      13.1  Registration of MIME Media Type . . . . . . . . . . . . . 19
        13.1.1  audio/EVRC Media Type Registration . . . . . . . . .  19
        13.1.2  audio/SMV Media Type Registration . . . . . . . . . . 19
        13.1.3  audio/qcelp-common Media Type Registration . . . . .  20
      13.2  Optional Media Type Parameters  . . . . . . . . . . . . . 21
    14  Mapping to SDP Parameters  . . . . . . . . . . . . . . . . .  22
    15  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . 23
    16  References . . . . . . . . . . . . . . . . . . . . . . . . .  23
    17  Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . 24
    
1 Abstract
    
    This document describes a common [RTP] payload format for speech
    encoded using wireless vocoders which share certain common
    characteristics (see section 6).
    



Espelien & Gellens              Expires April 2002              [Page 2]Internet Draft            Common Payload Format            October 2001

    This is expected to be especially useful in wireless systems.  For
    exmaple, CDMA networks use one of three vocoders: [PureVoice]
    (Qcelp), [EVRC] (Enhanced Variable Rate Codec) and in the future
    [SMV] (Selectable Mode Vocoder).  All of these vocoders share a
    number of common characteristics (see section 6) and can be
    transmitted using the RTP payload format specified in this document.
    New vocoders with such characteristics can easily be added to this
    common format by following the steps in section 7.11.
    
    An interleaved format is included to reduce the effect of packet
    loss on speech quality, as well as a bundled format, and a format
    optimized for header compression.
    
2 Conventions Used in this Document
    
    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
    document are to be interpreted as described in RFC 2119 [KEYWORDS].
    
3 Changes from Previous Revision
    
    This is the second version.  Changes include:
        + make frame size more generic (previous version assumed 20 ms
        frame size)
        + clarify null and erasure frames
        + correct grammatical errors.
4 Introduction
    
    This document describes a generalized format for use as an [RTP]
    payload type.  Three CDMA vocoders are initially specified in this
    common format; more can be added by following the procedures in
    section 7.11.  The [PureVoice] Qcelp vocoder and the [EVRC] vocoder
    are already widely deployed in CDMA wireless networks. [SMV] is the
    codec of choice for next generation CDMA wireless networks and is
    likely to be widely deployed as next generation wireless networks
    are rolled out.
    
    Multiple codec data frames MAY be bundled together to reduce the
    per-frame transmission overhead.
    
    Codec data frames can be interleaved to reduce quality degradation
    due to lost packets.  The sender can choose various interleave
    settings based on the importance of low end-to-end delay versus
    greater tolerance for lost packets.
    
    A format optimized for header compression is provided (see section
    7.8).
    
5 Background and Motivation for Common Format
    
    The Electronic Industries Association (EIA) & Telecommunications
    Industry Association (TIA) has published three standards which


Espelien & Gellens              Expires April 2002              [Page 3]Internet Draft            Common Payload Format            October 2001

    define the speech compression algorithms for CDMA applications:
    PureVoice, EVRC and SMV.
    
    The [SMV] codec is the preferred speech codec standard for CDMA2000.
    The SMV will be deployed in third generation handsets, in addition
    to PureVoice and EVRC codecs.
    
    There are currently handsets that support two of these codecs, and
    in the future handsets might support all three codecs.  The
    PureVoice and EVRC codecs are currently deployed in millions of
    first and second generation CDMA handsets.
    
    The format of the three codec (PureVoice, EVRC and SMV) frames is
    very similar.
    
    The similarities suggest that a common specification for
    encapsulating these three wireless vocoders as well as potential
    future wireless vocoders is possible and worth pursuing.
    
    The environment (memory, processor speed, etc.) of wireless handsets
    is constrained.  A common RTP payload format for multiple vocoders
    allows the handset to support these vocoders with a single, smaller
    RTP implementation than would be needed for separate formats,
    reducing code size and complexity, and therefore shortening time to
    market, lowering costs, and improving quality.  It also permits
    saved handset resources to be spent on user features.
    
    Since an RTP format for [EVRC] and [SMV] has not yet been approved,
    a direct case can be made for a common format supporting at least
    these two (plus future) codecs.
    
    The situation with [PureVoice] is more complex.  An RTP format
    already exists [vnd.Qcelp] and is specified in [RFC2658]; therefore
    it would be ideal for a common format supporting PureVoice as well
    as EVRC and SMV to interoperate with existing implementations of
    [vnd.Qcelp].  However, if interoperability is sacrificed,
    significant benefits can be obtained by making better use of RTP
    packet bits; for example, allowing for table-of-contents entries as
    well as a frame count field, yet spending the same number of bits
    (or fewer) per packet on average.
    
    The common format specified here gives up interoperability with
    [vnd.Qcelp] in order to gain packet optimization benefits.
    
6 Common Characteristics
    
    The format of the three initial codec (PureVoice, EVRC and SMV)
    frames is very similar.  This specification is designed to transport
    data frames of vocoders that have the following characteristics:
    
        - are frame based



Espelien & Gellens              Expires April 2002              [Page 4]Internet Draft            Common Payload Format            October 2001

        - null and erasure frames are allowed
        - total number of rates < 17.
        - maximum full rate frame can be transported in a single RTP
        packet using this specific format.
    
    Vocoders with characteristics that can be expressed in format type,
    TOC entries and codec frames can easily be expressed in this common
    format.  New vocoders with such characteristics can be added to this
    common format by following the steps in section 7.11.
    
6.1 PureVoice Characteristics
    
    The Qcelp [PureVoice] codec compresses each 20 milliseconds of 8000
    Hz sampled input speech into one of four different size output
    frames:  Rate 1 (266 bits), Rate 1/2 (124 bits), Rate 1/4 (54 bits)
    or Rate 1/8 (20 bits).  In addition, there are two zero bit vocoder
    frame types (see PureVoice Table in section 7.2): null frames and
    erasure frames. (Erasure frames are never transmitted; they are
    substituted by the receiver for lost or damaged frames.  Null frames
    are produced as a result of the vocoder running at rate 0.  Null
    frames are zero bits long and are also not transmitted.)
    
6.2 EVRC Characteristics
    
    The [EVRC] codec compresses each 20 milliseconds of 8000 Hz sampled
    input speech into one of four different size output frames:  Rate 1
    (171 bits), Rate 1/2 (80 bits), Rate 1/4 (40 bits) or Rate 1/8 (16
    bits).  In addition, there are two zero bit vocoder frame types (see
    EVRC Table in section 7.2): null frames and erasure frames.
    (Erasure frames are never transmitted; they are substituted by the
    receiver for lost or damaged frames.Null frames are produced as a
    result of the vocoder running at rate 0.  Null frames are zero bits
    long and are also not transmitted.)
    
6.3 SMV Characteristics
    
    Like the EVRC, the [SMV] codec also compresses each 20 milliseconds
    of 8000 Hz sampled input speech into one of four different size
    output frames:  Rate 1 (171 bits), Rate 1/2 (80 bits), Rate 1/4 (40
    bits) or Rate 1/8 (16 bits).  In addition, there are two zero bit
    vocoder frame types (see SMV Table in section 7.2): null frames and
    erasure frames. (Erasure frames are never transmitted; they are
    substituted by the receiver for lost or damaged frames.  Null frames
    are produced as a result of the vocoder running at rate 0.  Null
    frames are zero bits long and are also not transmitted.)
    
    The SMV is more bandwidth efficient than the EVRC vocoder.  The SMV
    achieves lower average data rates (ADR) by transmitting at
    percentages of each rate as shown in the table above.  The
    assumptions and details of noise levels and ADR are described in
    Chapter 4 of [SMV].  The EVRC is equivalent in performance to SMV
    mode 1.


Espelien & Gellens              Expires April 2002              [Page 5]Internet Draft            Common Payload Format            October 2001

    The SMV codec operates in one of four modes.  Each mode employs one
    of the vocoders operating at the rates mentioned above.  Each mode
    operates in all rates (full to 1/8) for varying percentages of time,
    based on desired average data rate specified, taking into account
    characteristics of the speech samples.
    
    [SMV] modes can be changed on a frame by frame basis.  Note that the
    [SMV] mode is not encapsulated in the RTP packet; only fields
    defined in section 7.1 or 7.8 are sent as RTP payload. [SMV] modes
    are included in this document for informational purposes only.
    
    While each [SMV] mode can operate in all rates (full to 1/8) for
    varying percentages of time, higher or lower average data rate are
    achieved for each mode.  This is shown in the table below:
    
                    Mode 0       Mode 1       Mode 2        Mode 3
      -------------------------------------------------------------
      Rate 1        68.90%       38.14%       15.43%        07.49%
      Rate 1/2      06.03%       15.82%       38.34%        46.28%
      Rate 1/4      00.00%       17.37%       16.38%        16.38%
      Rate 1/8      25.07%       28.67%       29.85%        29.85%
      -------------------------------------------------------------
      ADR          7205 bps     5182 bps     4073 bps      3692 bps
    
    The SMV codec chooses the output frame rate based on an analysis of
    the input speech and the current operating mode (either normal or
    one of three reduced rates).  For typical speech patterns, this
    results in an average output of 4.2k bits/second for normal mode and
    lower for reduced rate modes.
    
    
7 Common RTP Packet Format
    
    The RTP timestamp is in 1/8000 of a second units.  The RTP payload
    data for the common format is one of two types: normal (type 1) and
    optimized single frame (type 2).
    
7.1 Normal Format
    
    Normal packet format allows for multiple codec frames to be included
    in each RTP packet.  The sender chooses how many codec data frames
    to include in each RTP packet.  If more than one, the sender chooses
    to bundle or interleave the frames.  Bundling groups two or more
    consecutive data frames in a single RTP packet.  Interleaving groups
    two or more non-consecutive frames in a packet.  Interleaving can
    mitigate the listener's perception of data loss.
    







Espelien & Gellens              Expires April 2002              [Page 6]Internet Draft            Common Payload Format            October 2001

    The normal codec RTP payload data is formatted as follows:
    
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      RTP Header [RTP]                         |
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |R|R| LLL | NNN |R|R|Frame Count|  TOC  |  ...  |  TOC  |padding|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |one or more codec data frames, one per TOC entry               |
   |                             ....                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    
    The RTP header has the expected values as described in [RTP].  The
    use of the marker bit in the RTP header is outside the scope of this
    document.  The use of the marker bit is defined by the application.
    
    When multiple codec data frames are present in a single RTP packet,
    the timestamp is, as always, that of the oldest data represented in
    the RTP packet.
    
    The assignment of an RTP payload type for this new packet format is
    outside the scope of this document, and will not be specified here.
    It is expected that the RTP profile for a particular class of
    applications will assign a payload type for this encoding, or if
    that is not done then a payload type in the dynamic range will be
    chosen. [SDP] can be used to signal out of band the RTP payload type
    (see example in section 14).
    
    The fields following the RTP header have the following meaning:
    
    1st bit:  Reserved (R): 1 bit
        MUST be set to zero by sender; SHOULD be ignored by receiver.
        
    2nd bit:  Reserved (R): 1 bit
        MUST be set to zero by sender; SHOULD be ignored by receiver.
        
    3rd-5th bit:  Interleave (LLL): 3 bits
        MUST be set to a value from 0 to 7.  If this field is non-zero,
        interleaving is enabled.  All receivers MUST support
        interleaving.  Senders MAY support interleaving.  Senders that
        do not support interleaving MUST set field LLL and NNN to zero.
        
    6th-8th bit:  Interleave Index (NNN): 3 bits
        MUST have a value less than or equal to the value of LLL.
        Values of NNN greater than the value of LLL are invalid.
    
    More than one codec data frame MAY be included in a single RTP
    packet.  Multiple data frames are either bundled or interleaved.
    Bundling is described in detail in Section 7.3, and interleaving in
    Section 7.4.
    


Espelien & Gellens              Expires April 2002              [Page 7]Internet Draft            Common Payload Format            October 2001

    If only one codec data frame is included in an RTP packet, the LLL
    and NNN fields MUST be zero.
    
    9th bit:  Reserved (R): 1 bit
        MUST be set to zero by sender; SHOULD be ignored by receiver.
        
    10th bit:  Reserved (R): 1 bit
        MUST be set to zero by sender; SHOULD be ignored by receiver.
        
    11th-16th bit:  Frame Count (Count): 6 bits
        MUST be set by sender to the number of codec data frames minus
        one.  Valid values range from 0 to 63.  The frame count plus one
        indicates how many TOC entries (and codec data frames) are
        present in the RTP packet.  A value of zero indicates one frame.
        A value of 63 indicates 64 frames.
        
    TOC entries are described in section 7.2.  TOC entries provide
    information about the encoding rate and length of the respective
    codec frame.  Codec frames are speech data encoded at various rates
    (Full, 1/2, 1/4, or 1/8).  Null and erasure frames are not played
    out but have zero length and corresponding TOC entry indicating null
    or erased frame type.
    
    17th-20th bit:  First Table of Contents (TOC): 4 bits
        MUST be set by sender as described in section 7.2.  There is one
        TOC entry for each codec frame.  The value can range from 0 to 5
        as shown in the three tables below.  Each value indicates to the
        receiver the length of the corresponding codec data frame.
        
    Padding (padding): 0 or 4 bits
        If the frame count is odd, then the sender MUST set 4 bits of
        padding following the last TOC entry and preceding the first
        codec data frame to zero.  If the frame count is even, then no
        padding is used; the first codec data frame immediately follows
        the last TOC entry.
        
    The receiver interprets the bits following the last TOC entry or
    padding as the first codec data frame.
    
    Codec Frame(s):
        Length depends on codec and rate See descriptions in section
        7.2.  Each codec frame uses zero or more bits, depending on the
        rate specified by TOC and codec type specified by MIME type.
        (For example, half Rate EVRC and SMV codec frames are 80 bits
        long, while a half rate PureVoice codec frames are 124 bits
        long.) The sender sets the TOC value, and associated codec
        frame.  The tables below correlate TOC values with valid codec
        lengths for the initial three codecs; future codecs specify
        mapping in their MIME registration, as per section 7.11.
    




Espelien & Gellens              Expires April 2002              [Page 8]Internet Draft            Common Payload Format            October 2001

7.2 TOC Entries
    
    TOC entries apply only to multiple frame (Type 1) format as
    described in section 7.1.  Each TOC entry is correlated with the
    respective codec data frame.  The TOC value indicates the rate set
    and number of bits in the data frame.  For PureVoice, EVRC and SMV
    the following tables are used:
    
       TOC                           PureVoice
      Value   Rate       Codec data frame size (in octets)
      -----  -------    ----------------------------------------------
        0     Blank      0    (0 bits)
        1     1/8        3    (20 bits; 4 zero bits of padding at end)
        2     1/4        6    (54 bits; 2 zero bits of padding at end)
        3     1/2       16    (124 bits; 4 zero bits of padding at end)
        4     1         34    (266 bits; 6 zero bits of padding at end)
        5     Erasure    0    SHOULD NOT be transmitted by sender
        6-15  n/a       n/a   Reserved. SHOULD NOT be transmitted
    Note that the common frame format for PureVoice has TOC entries
    instead of lead bytes.  As a result, the PureVoice codec frame size
    in the table indicates the size of the data itself, just as it does
    for EVRC and SMV.
        
       TOC                          EVRC 
      Value   Rate      Codec data frame size (in octets)
      -----   -------   --------------------------------------------
        0     Blank      0    (0 bits)
        1     1/8        2    (16 bits)
        2     1/4        5    (40 bits)
        3     1/2       10    (80 bits)
        4     1         22    (171 bits; 5 padded at end with zeros)
        5     Erasure    0    SHOULD NOT be transmitted by sender
        6-15  n/a       n/a   Reserved. SHOULD NOT be transmitted
        
       TOC                          SMV 
      Value   Rate      Codec data frame size (in octets)
      -----   -------   ---------------------------------------------
        0     Blank      0    (0 bits)
        1     1/8        2    (16 bits)
        2     1/4        5    (40 bits)
        3     1/2       10    (80 bits)
        4     1         22    (171 bits; 5 padded at end with zeros)
        5     Erasure    0    SHOULD NOT be transmitted by sender
        6-15  n/a       n/a   Reserved. SHOULD NOT be transmitted
    
7.3 Bundling Codec Data Frames
    
    Bundling codec data frames only applies to multiple frame format as
    described in section 7.1.  As indicated in section 7, more than one
    codec data frame MAY be included in a single RTP packet.  Bundling
    codec data frames means multiple data frames are included
    consecutively in a packet (without interleaving).  The bundling of


Espelien & Gellens              Expires April 2002              [Page 9]Internet Draft            Common Payload Format            October 2001

    codec data frames is signaled by setting the frame count to a value
    greater than 0 (which also requires that the LLL and the NNN values
    MUST both be zero).
    
    Senders MAY support bundling.  All receivers MUST support bundling.
    Receivers MAY signal the maximum number of codec data frames they
    can handle in a single RTP packet.  This can be done using out of
    band signaling (for example in [SDP] parameters).  See also maxptime
    in section 13.2.
    
7.3.1 Additional Bundling Restrictions on the Sender
    
    Furthermore, senders have the following additional restrictions:
    
    o MUST never include more codec data frames in a single RTP packet
    than signaled by maxptime in Section 13.1.
    
    o To the extent that it is possible to determine the MTU of the
    underlying transport, MUST not include more codec data frames in a
    single RTP packet than will fit in the MTU.  For the purpose of
    computing the maximum bundling value, all codec data frames SHOULD
    be assumed to have the Rate 1 size.
    
    It is essential that a single codec full rate frame be sent in an
    unfragmented single RTP packet.  Note that optimized single frames
    are sent 20 ms (milliseconds) at a time, one in each RTP packet.
    Therefore for optimized single frame format, maxptime MUST be 20 ms,
    for the currently supported vocoders; see section 14.
    
7.4 Interleaving Codec Data Frames
    
    Interleaving is meaningful only when more than one codec data frame
    is bundled into a single RTP packet.
    
    All receivers MUST support interleaving.  Senders MAY support
    interleaving.
    
    Interleaving of codec data frames is signaled by setting the LLL
    bits to a value from 1 to 7 inclusive.
    
    Receivers MAY signal the maximum number of bundles (maxinterleave)
    they can handle in a single interleaving group.  This can be done
    using out of band signaling (for example in [SDP] parameters).
    Section 13.2 describes the maxinterleave parameter.
    









Espelien & Gellens             Expires April 2002             [Page 10]Internet Draft            Common Payload Format            October 2001

    Given a time-ordered sequence of output, codec frames numbered 0..n,
    a bundling value B, and an interleave value L where n = B * (L+1) -
    1, the output frames are placed into RTP packets as follows (the
    values of the fields LLL and NNN are indicated for each RTP packet):
    
    First RTP Packet in Interleave group:
        LLL=L, NNN=0
        Frame 0, Frame L+1, Frame 2(L+1), Frame 3(L+1), ... for a total
        of B frames
    
    Second RTP Packet in Interleave group:
        LLL=L, NNN=1
        Frame 1, Frame 1+L+1, Frame 1+2(L+1), Frame 1+3(L+1), ... for a
        total of B frames
    
    This continues to the last RTP packet in the interleave group:
    
    L+1 RTP Packet in Interleave group:
        LLL=L, NNN=L
        Frame L, Frame L+L+1, Frame L+2(L+1), Frame L+3(L+1), ... for a
        total of B frames
    
    Senders MUST transmit in timestamp-increasing order.  Furthermore,
    within each interleave group, the RTP packets making up the
    interleave group MUST be transmitted in value-increasing order of
    the NNN field.  While this does not guarantee reduced end-to-end
    delay on the receiving end, when packets are delivered in order by
    the underlying transport, delay is reduced to the minimum possible.
    
7.4.1 Additional Interleaving Restrictions on the Sender
    
    Additionally, senders have the following restrictions:
        
        o Once beginning a session with a given maximum interleaving
        value, the sender MUST NOT increase the interleaving to a value
        that exceeds the maximum interleaving that was signaled.  The
        maximum interleaving value is signaled by maxinterleave in
        section 13.2.
        
        o MAY change the interleaving value only between interleave
        groups.
    
7.5 Finding Interleave Group Boundaries
    
    Given an RTP packet with sequence number S, interleave value (field
    LLL) L, and interleave index value (field NNN) N, the interleave
    group consists of RTP packets with sequence numbers from S-N to
    S-N+L inclusive.  In other words, the interleave group always
    consists of L+1 RTP packets with sequential sequence numbers.  The
    bundling value for all RTP packets in an interleave group MUST be
    the same.
    


Espelien & Gellens             Expires April 2002             [Page 11]Internet Draft            Common Payload Format            October 2001

    The receiver determines the expected bundling value for all RTP
    packets in an interleave group by the number of codec data frames
    bundled in the first RTP packet of the interleave group received.
    Note that this might not be the first RTP packet of the interleave
    group sent if packets are delivered out of order (or lost) by the
    underlying transport.
    
    On receipt of an RTP packet in an interleave group with other than
    the expected bundling value, the receiver MAY discard codec data
    frames off the end of the RTP packet or add erasure codec data
    frames to the end of the packet in order to manufacture a substitute
    packet with the expected bundling value.  The receiver MAY instead
    choose to discard the whole interleave group and play silence.
    
7.6 Reconstructing Interleaved Speech
    
    Given an RTP sequence number ordered set of RTP packets in an
    interleave group numbered 0..L, where L is the interleave value and
    B is the bundling value, and codec data frames within each RTP
    packet that are numbered in order from first to last with the
    numbers 1..B, the original, time-ordered sequence of output frames
    from the codec is reconstructed as follows:
    
    First L+1 frames:
    
    Frame 0 from packet 0 of interleave group
        Frame 0 from packet 1 of interleave group
      And so on up to...
        Frame 0 from packet L of interleave group
        
    Second L+1 frames:
        Frame 1 from packet 0 of interleave group
        Frame 1 from packet 1 of interleave group
      And so on up to...
        Frame 1 from packet L of interleave group
        
      And so on up to...
      
      Bth L+1 frames:
        Frame B from packet 0 of interleave group
        Frame B from packet 1 of interleave group
      And so on up to...
        Frame B from packet L of interleave group
    
7.7 Receiving Invalid Values
    
    On receipt of an RTP packet with an invalid value of the NNN field,
    the RTP packet MUST be treated as lost by the receiver for the
    purpose of generating erasure frames as described in section 9.
    
    A codec data frame with a reserved value in the TOC field SHOULD
    also be considered invalid.  All codec frames in a packet after an


Espelien & Gellens             Expires April 2002             [Page 12]Internet Draft            Common Payload Format            October 2001

    invalid TOC field SHOULD be considered invalid.
    
7.8 Optimized Single Frame Format
    
    Optimized single frame format is designed for maximum efficiency in
    transmission of codec data with certain forms of header compression.
    Only one codec data frame is sent in each RTP packet, and there are
    no frame count or TOC field entries, or other payload header fields.
    The codec rate can be determined from the length of the codec frame,
    since there is only one codec data frame in each RTP packet of this
    type.
    
    If two frame types have different rates, but are expressed in the
    same number of codec frame bytes, there MUST be other signaling to
    distinguish them.  For example, the codec sender could encode the
    rate in the frame data.  This is a vocoder design issue and further
    discussion is out of the scope of this document.
    
    The optimized single frame RTP payload data is formatted as follows:
    
    0                   1                   2                   3   
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      RTP Header [RTP]                         |
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |                    Only one codec data frame                  |
   |                              ....                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    
7.9 Detecting Which Format
    
    All receivers MUST be able to process both types of packets.  The
    sender MAY choose to use one or both types of packets.
    The packets of the two types can be distinguished by checking the
    payload type field in the RTP header.  The association of payload
    type number with the packet type is done out-of-band, for example by
    [SDP] during the setup of a session.
    
7.10 Codec Data Frame Format
    
    The formats described in this section are applicable to both normal
    and optimized single frame RTP payload formats as described in
    sections 7.1 and 7.8.
    
    Bits are layed out as they come out of the vocoder.  This will be
    referred to as native format.  The native format for [PureVoice] is
    LSB (least significant bit) first (see example in section 7.10.1).
    The native format for [EVRC] and [SMV] is MSB (most significant bit)
    first (see example in section 7.10.2).
    




Espelien & Gellens             Expires April 2002             [Page 13]Internet Draft            Common Payload Format            October 2001

7.10.1 PureVoice Codec Data Frame Format
    
    The output of the PureVoice codec is converted into data frames for
    inclusion in the RTP payload as follows:
    
    The bits as numbered in the standard [PureVoice] from the highest to
    the lowest are packed into octets.  The highest numbered bit (bit
    265 for Rate 1, bit 123 for Rate 1/2, bit 53 for Rate 1/4 and bit 19
    for Rate 1/8) is placed in the most significant bit (Internet bit 0)
    of the first octet (octet 0) of the codec data frame; the second
    highest numbered bit (bit 264 for Rate 1... bit 18 for Rate 1/8) is
    placed in the second most significant bit (Internet bit 0) of the
    first octet (octet 0) of the codec data frame.  This continues until
    all of the bits have been placed in the codec data frame.  Any
    remaining unused bits of the last octet of the codec data frame MUST
    be set to zero.
    
    For example, the frame below shows in detail how a PureVoice Rate
    codec 1/8 frame is packed into a data frame:
    
    The codec data frame for a Rate 1/8 frame is 20 bits long.  Bits 0
    through 19 from the standard Rate 1/8 frame are placed as indicated
    with bits marked with "Z" being set to zero.  The Rate 1/4, 1/2 and
    full rate frames are converted similarly (with padding) to align on
    octet boundaries.
    
              PureVoice Rate 1/8 codec data frame (octet 0 - 2)
    
              0                   1                   2       
              0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 
             +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
             |1|1|1|1|1|1|1|1|1|1| | | | | | | | | | | | | | |
             |9|8|7|6|5|4|3|2|1|0|9|8|7|6|5|4|3|2|1|0|Z|Z|Z|Z|
             +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    
    Internet bit 0 refers to the left-most bit of the left-most octet.
    Internet bit 1 refers the next bit (to the right) of the left-most
    octet. [RFC 2658] discusses network byte and internet byte order in
    more detail.
    
7.10.2 EVRC or SMV Codec Data Frame Format
    
    The output of the EVRC or SMV codec is converted into data frames
    for inclusion in the RTP payload as follows:
    
    The bits as numbered in the standard [RTP] from the lowest to the
    highest are packed into octets.  The lowest numbered bit (bit 1) is
    placed in the most significant bit (Internet bit 0) of the first
    octet of the codec data frame; the second lowest bit is placed in
    the second most significant bit of the first octet, the third lowest
    in the third most significant bit of the first octet, and so on.
    This continues until all of the bits have been placed in the codec


Espelien & Gellens             Expires April 2002             [Page 14]Internet Draft            Common Payload Format            October 2001

    data frame.  Any remaining unused bits of the last octet of the
    codec data frame MUST be set to zero (note that this is only
    applicable to rate 1 frames as the others fit completely into a
    whole number of octets).
    
    For example, the frame below shows in detail how an EVRC or SMV Full
    Rate 1 codec frame is packed into a data frame:
    
             EVRC or SMV Rate 1 codec data frame (octet 0 - 3)
                                    
      0                   1                   2                   3   
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
     |0|0|0|0|0|0|0|0|0|1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|3|
     |1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    
    
                  Rate 1 codec data frame (octet 19 - 21)
    
      1           1                   1                   1           
      4           5                   6                   7           
      4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1| | | | | |
     |4|4|4|4|4|5|5|5|5|5|5|5|5|5|5|6|6|6|6|6|6|6|6|6|6|7|7|Z|Z|Z|Z|Z|
     |5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1| | | | | |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    
    The codec data frame for a Rate 1 frame is 22 octets long.  Bits 1
    through 171 from the standard Rate 1 frame are placed as indicated
    with bits marked with "Z" being set to zero.  The Rate 1/8, 1/4, and
    1/2 frames are converted similarly but do not require zero padding
    because they align on octet boundaries.
    
    
7.11 Adding New Codecs
    
    Codecs that share the characteristics in section 6 can be added to
    this common format by following the steps below:
    
        1.  Register new MIME type.
        2.  In MIME Type registration specify that when transported in
        RTP, this common format is used.
        3.  Provide mapping of TOC value to rate and frame size of codec
        payload (as shown in section 7.2).
    
8 Tardy Packets
    
    Assume that the receiver has begun playing frames from an interleave
    group.  The time has come to play frame x from packet n of the


Espelien & Gellens             Expires April 2002             [Page 15]Internet Draft            Common Payload Format            October 2001

    interleave group.  Further assume that packet n of the interleave
    group has not been received.
    
    Now, assume that packet n of the interleave group arrives before
    frame x+(L+1) of that packet is needed.  Receivers SHOULD use frame
    x+(L+1) of the newly received packet n rather than substituting an
    erasure frame.  In other words, just because packet n wasn't
    available the first time it was needed to reconstruct the
    interleaved speech, the receiver SHOULD NOT assume that the packet
    is not available when the same packet is subsequently needed for
    interleaved speech reconstruction.
    
9 Lost Packets
    
    Codecs transported using this format support the notion of erasure
    frames.  These are frames that for whatever reason are not
    available.  When reconstructing interleaved speech or playing back
    non-interleaved speech, erasure frames MUST be fed to the codec for
    all missing packets.
    
    Receivers MAY use the timestamp clock to determine how many codec
    data frames are missing.  For vocoders with 20 ms frames and 8 kHz
    sampling rate (such as the vocoders defined in section 7.10), each
    codec data frame advances the timestamp clock EXACTLY 160 (20ms x 8
    kHz) counts.
    
    Since the bundling/interleaving value can vary, the timestamp clock
    is the only reliable way to calculate exactly how many codec data
    frames are missing when a packet is dropped.
    
    Specifically when reconstructing interleaved speech, a missing RTP
    packet in the interleave group SHOULD be treated as containing B
    erasure codec data frames where B is the bundling value for that
    interleave group.
    
10 Implementation Issues
    
10.1 Interleaving Length
    
    All wireless codecs interpolate the missing speech content when
    given an erasure frame.  However, consecutive erasure frames reduce
    the listener's perception of voice quality.  This makes interleaving
    desirable over bundling as it increases speech quality in the
    presence of lost packets.
    
    On the other hand, interleaving can greatly increase the end-to-end
    delay.  Where an interactive session is desired, an interleave value
    (field LLL) of 0 to 2 is RECOMMENDED.
    
    When end-to-end delay is not a concern, an interleaving value (field
    LLL) of 4 or 5 is RECOMMENDED, subject to maxinterleave parameter.
    See description of this parameter in section 13.2.


Espelien & Gellens             Expires April 2002             [Page 16]Internet Draft            Common Payload Format            October 2001

    The parameters maxbundle and maxinterleaving at the initial setup of
    the session guarantee that the receiver can allocate a well-known
    amount of buffer space at the beginning of the session that will be
    sufficient for all future reception in that session.  Less buffer
    space could be needed at some point in the future if the sender
    decreases the bundling value or interleaving value, but never more
    buffer space.  This prevents the receiver needing to allocate more
    buffer space (with the possible result that none is available).
    
11 Security Considerations
    
    RTP packets using the payload format defined in this specification
    are subject to the security considerations discussed in the RTP
    specification [RTP], and any appropriate profile (for example,
    [PROFILE]).
    
    This implies that confidentiality of the media streams can be
    achieved by encryption.  Because the data compression used with this
    payload format is applied end-to-end, encryption can be performed
    after compression so there is no conflict between the two
    operations.
    
    A potential denial-of-service threat exists for data encodings using
    compression techniques that have non-uniform receiver-end
    computational load.  The attacker can inject pathological datagrams
    into the stream which are complex to decode and cause the receiver
    to be overloaded.  However, this encoding does not exhibit any
    significant non-uniformity.
    
    As with any IP-based protocol, in some circumstances, a receiver can
    be overloaded simply by the receipt of too many packets, either
    desired or undesired.  Network-layer authentication can be used to
    discard packets from undesired sources, but the processing cost of
    the authentication itself might be too high.  In a multicast
    environment, pruning of specific sources might be implemented in
    future versions of IGMP [6] and in multicast routing protocols to
    allow a receiver to select which sources are allowed to reach it.
    
12 Real Time and Storage Mode
    
12.1 RTP Mode
    
    RTP mode is used to transmit codec frames in real time and
    interactive fashion (as opposed to playing a static stored file
    described in section 12.2.) RTP mode uses RTP headers with SDP
    negotiation (section 14) to describe the MIME media type and the RTP
    ptype format.
    
    Speech frames lost in transmission and non-received frames MUST be
    played out as erasure frames (see definition in Section 9) to keep
    synchronization with the original media.
    


Espelien & Gellens             Expires April 2002             [Page 17]Internet Draft            Common Payload Format            October 2001

12.2 Storage Mode
    
    Storage mode is used for storing speech frames, for example, as a
    file, email attachment, or web link.
    
    When stored as a file, the first few octets of the file are a "magic
    number" that identify the file.  See sections 13.1.1, 13.1.2 and
    13.1.3 for EVRC, SMV and PVC respectively for more details.
    
    All files are stored in normal mode groups (section 7.1).  It is
    optional for the application to translate between normal mode format
    and optimized mode format.  The codec data frames are stored in
    groups, preceded by group header information identical to payload
    header information as specified in section 7.  That is, the R, LLL,
    NNN, TOC entries, etc. are present.  Since there is no RTP header,
    and hence no timestamp, packets must be in order.
    
    Following the magic number octets, the file is formatted as follows:
    
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |R|R|0|0|0|0|0|0|R|R|Frame Count|  TOC  |  ...  |  TOC  |padding|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       one or more codec data frames, one per TOC entry        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    
    The meaning of the fields is specified in section 7.1.  The LLL and
    NNN fields MUST both be zero.  The format of the frames, including
    any padding, is identical to the normal mode specified in 7.1.
    
    This format, while more complex than other designs, makes it easy
    for an implementation to receive speech frames using RTP and store
    them, more or less as-is, in a file.  Conversely, it is simple for
    an implementation to read frames out of a file and transmit them
    using RTP.
    
    Speech frames lost in transmission and non-received frames MUST be
    stored as erasure frames (see definition in Section 9) to keep
    synchronization with the original media.
    
13 IANA Considerations
    
    This document registers three new MIME media type registrations.
    The registration forms appear below.
    
    The MIME media type names for each supported codec is allocated from
    the IETF tree since PureVoice and EVRC codecs are already widely
    deployed, and SMV is expected to be a widely used codec for
    voice-over-IP applications.
    



Espelien & Gellens             Expires April 2002             [Page 18]Internet Draft            Common Payload Format            October 2001

    RTP format is described previously (see sections 7.1 and 7.8.)
    
13.1 Registration of MIME Media Type
    
13.1.1 audio/EVRC Media Type Registration
    
    Media Type Name: audio
    
    Media Subtype Name:  EVRC
    
    Required Parameters: none
    
    Optional Parameters:
    
        ptype:  See Section 13.2.
        
        maxptime:  See Section 13.2.
        
        maxinterleave:  See Section 13.2.
        
    Optional parameters for storage mode: none
    
    Encoding considerations for RTP mode: see Section 13.2.
    
    Encoding considerations for storage mode: see Section 13.2.
    
    Security considerations: see Section 11.
    
    Public specification:  This document.
    
    Additional information for storage mode (see also section 12.2):
        Magic number (network byte order):
            ASCII character string "#!EVRC\n", that is, 0x2321455652430a
            in hexadecimal.
        File extensions:  EVC, evc
        Macintosh file type code: not specified
        Object identifier or OID: none
    
    Intended usage:  COMMON.
        It is expected that many VoIP applications (as well as mobile
        applications) will use this type.
    
    Person & email address to contact for further information:
        The authors of this document.
    
    Author/Change controller:
        The IESG.
    
13.1.2 audio/SMV Media Type Registration
    
    Media Type Name: audio
    


Espelien & Gellens             Expires April 2002             [Page 19]Internet Draft            Common Payload Format            October 2001

    Media Subtype Name:  SMV
    
    Required Parameters: none
    
    Optional Parameters:
        
        ptype:  See Section 13.2.
        
        maxptime:  See Section 13.2.
        
        maxinterleave:  See Section 13.2.
        
    Optional parameters for storage mode: none
    
    Encoding considerations for RTP mode: see Section 13.2.
    
    Encoding considerations for storage mode: see Section 13.2.
    
    Security considerations: see Section 11.
    
    Public specification:  This document.
    
    Additional information for storage mode (see also section 12.2):
        Magic number (network byte order):
            ASCII character string "#!SMV\n", that is, 0x2321534d560a in
            hexadecimal.
        File extensions: smv, SMV
        Macintosh file type code: not specified
        Object identifier or OID: none
    
    Intended usage:  COMMON.  It is expected that many VoIP applications
    (as well as mobile applications) will use this type.
    
    Person & email address to contact for further information:
        The authors of this document.
    
    Author/Change controller:
        The IESG.
    
13.1.3 audio/qcelp-common Media Type Registration
    
    Media Type Name: audio
    
    Media Subtype Name: qcelp-common
    
    Required Parameters: none
    
    Optional Parameters:
    
        ptype:  See Section 13.2.
        



Espelien & Gellens             Expires April 2002             [Page 20]Internet Draft            Common Payload Format            October 2001

        maxptime:  See Section 13.2.
        
        maxinterleave:  See Section 13.2.
        
    Optional parameters for storage mode: none
    
    Encoding considerations for RTP mode: see Section 13.2.
    
    Encoding considerations for storage mode: see Section 13.2.
    
    Security considerations: see Section 11.
    
    Public specification:  This document.
    
    Additional information for storage mode (see also section 12.2):
        Magic number (network byte order):
            ASCII character string "#!PVC\n", that is, 0x23215056430a in
            hexadecimal.
        File extensions: pvc, PVC
        Macintosh file type code: not specified
        Object identifier or OID: none
    
    Intended usage:  COMMON.  It is expected that many VoIP applications
    (as well as mobile applications) will use this type.
    
    Person & email address to contact for further information:
        The authors of this document.
    
    Author/Change controller:
        The IESG.
    
13.2 Optional Media Type Parameters
    
    These parameters are applicable to all three media and submedia
    types described above.
    
    Optional parameters for RTP mode:
    
        ptype:
            Ptype indicates the type of RTP/media subtype packet.  The
            default value is 1.  Valid values are 1 or 2.  Ptype value 1
            indicates normal format (see section 7.1), while ptype value
            2 indicates optimized header compressed codec format (see
            section 7.8).
            
        maxptime:
            The maximum amount of media which can be encapsulated in
            each packet, expressed as time in milliseconds.  The time
            SHALL be calculated as the sum of the time the media present
            in the packet represents.  The time SHOULD be a multiple of
            the frame size.  If not signaled, the default maxptime value
            is ten frames of the native codec frame length (in


Espelien & Gellens             Expires April 2002             [Page 21]Internet Draft            Common Payload Format            October 2001

            milliseconds) times the sampling rate; for 20msec / 8kHz
            vocoders, this is 200 ms.
            
        maxinterleave:
            Maximum number for interleaving value.  The interleaving
            values used in the entire session MUST not exceed this
            maximum value.  If not signaled, the default maxinterleave
            value is 5.
            
    Optional parameters for storage mode: none
    
    Encoding considerations for RTP mode: see Section 7, and Section
    7.3 and 7.4 of this document.
    
    Encoding considerations for storage mode:
        Storage mode is identical to RTP mode.  A stored file is made up
        of essentially multiple RTP packets without the RTP, UDP, etc
        headers.
        
        Normal (type 1) encoded speech frames MUST be stored in RTP
        sequence number order.  Furthermore, missing frames and
        non-received frames during non-speech period MUST be
        encapsulated into a compound codec payload as blank frames or
        erasures.  Each receiving entity that accepts this MIME type
        MUST be able to decode all codec coding modes.
        
        For normal codec frames, bundling and interleaving information
        is included in each grouping.
    
    Security considerations: see Section 11.
    
    Public specification:  This document.
    
    Intended usage:  COMMON.  It is expected that many VoIP applications
    (as well as mobile applications) will use this type.
    
    Person & email address to contact for further information:
        The authors of this document.
    
    Author/Change controller:
        The IESG.
    
14 Mapping to SDP Parameters
    
    Please note that this section applies to packets transmitted using
    RTP.
    
    Parameters are mapped to [SDP] as usual.
    
    Example usage in SDP, for PureVoice vocoder run in normal format:
        m = audio 49120 RTP/AVP 97
        a = rtpmap:97 qcelp-common


Espelien & Gellens             Expires April 2002             [Page 22]Internet Draft            Common Payload Format            October 2001

        a = fmtp:97 ptype=1; maxptime=80 ms
    
    Example usage in SDP, for SMV vocoder run in optimized single frame
    format:
        m = audio 49120 RTP/AVP 98
        a = rtpmap:98 SMV
        a = fmtp:98 ptype=2; maxptime=20 ms
    
    Since all optimized single frames (ptype = 2) for the currently
    supported vocoders are 20 ms long, maxptime MUST be 20 ms.  If a new
    vocoder is added with a different frame duration, maxptime for that
    Vocoder MUST equal the vocoder's frame time.
    
15 Acknowledgements
    
    This document heavily borrows from "RTP Payload Format for
    PureVoice(tm) Audio" by Kyle McKay (RFC 2658, August 1999).
    Material has also been used from "An RTP Payload Format for EVRC
    Speech", Adam Li (editor), a work in progress.  The authors and
    others who contributed to these two documents made this document
    possible.
    
    The authors thank the following colleagues for contributing to this
    document:  Rusty Sanders, Trevor Bourget, Eric Rosen, Harleen Gill,
    Kirti Gupta.
    
16 References
    
    [PureVoice] TIA/EIA/IS-733, "High Rate Speech Service Option for
    Wideband Spread Spectrum Communication Systems", January 1997.  May
    be ordered online at http://www.eia.tia.org/eng.
    
    [EVRC] TIA/EIA/IS-127, "Enhanced Variable Rate Codec, Speech Service
    Option 3 for Wideband Spread Spectrum Digital Systems", January
    1997.
    
    [SMV] TIA/EIA/IS-893, "Selectable Mode Vocoder", August 2001
    published as PNSP-4575.
    
    [RTP] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson,
    "RTP:  A Transport Protocol for Real-Time Applications", RFC 1889,
    January 1996.
    
    [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate
    Requirement Levels", BCP 14, RFC 2119, March 1997.
    
    [PROFILE] Schulzrinne, H., "RTP Profile for Audio and Video
    Conferences with Minimal Control", RFC 1890, January 1996.
    
    [RFC 2658] McKay, K., "RTP Payload Format for PureVoice(tm) Audio",
    RFC 2658, August 1999.
    


Espelien & Gellens             Expires April 2002             [Page 23]Internet Draft            Common Payload Format            October 2001

    [SDP] M. Handley and V. Jacobson, "SDP:  Session Description
    Protocol", RFC 2327, April 1998.
    
    [IGMP] Deering, S., "Host Extensions for IP Multicasting", STD 5,
    RFC 1112, August 1989.
    
17 Authors' Addresses
    
    Magdalena L. Espelien
    QUALCOMM Incorporated
    5775 Morehouse Drive
    San Diego, CA 92121-1714
    USA
    
    Phone: +1 858 651-6733
    Email: magda@qualcomm.com
    
    Randall Gellens
    QUALCOMM Incorporated
    5775 Morehouse Drive
    San Diego, CA 92121-1714
    USA
    
    Phone: +1 858 651-5115
    Email: rg+ietf@qualcomm.com





























Espelien & Gellens             Expires April 2002             [Page 24]