Internet DRAFT - draft-hatanaka-avt-rtp-atracx

draft-hatanaka-avt-rtp-atracx



Internet Engineering Task Force                      Mitsuyuki Hatanaka
Internet Draft                                         Sony Corporation
Document: draft-hatanaka-avt-rtp-atracx-03.txt            October  2003
                                                 Expires: March 27 2004

                      RTP payload format for ATRAC-X 

Status of this Memo

This document is an Internet-Draft and is in full conformance with all 
provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups.  Note that other
groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time.  It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft
Shadow Directories can be accessed at http://www.ietf.org/shadow.html.


Abstract 
This document describes an RTP payload format for efficient and
flexible transporting of ATRAC-X encoded audio data.  ATRAC-X is a high
quality audio coding technology that supports multiple channels.  The
RTP payload format as presented in this document includes support for
metadata, data fragmentation, and continuous decoding even during
packet losses.


1. Introduction
ATRAC-X is a state-of-the-art perceptual audio coding technology, and
is the successor of ATRAC and ATRAC3.  ATRAC technology has been used
in MD, NetMD, and Memory Stick Audio products.  Improvements over
previous versions of ATRAC include:
 - Higher sound quality at lower bit-rates
 - Wide range of bit-rates, from 8kbps to 1.4Mbps
 - Support for multichannel coding
 - A flexible format for future extensions
 - Suitability for streaming, including scalability and fixed frame
   lengths



Hatanaka                                                        [Page 1]


INTERNET-DRAFT    draft-hatanaka-avt-rtp-atracx-03.txt  October  2003




The modularity and portability of ATRAC-X means it can be widely used
in many applications and platforms.

1.1 Overview of ATRAC-X
ATRAC-X can deliver multiple channels of audio, from monaural to 7.1
channels, and from bit rates of 8kbps to 1.4Mbps.  Sampling rates of
32kHz, 44.1kHz and 48kHz are currently supported, with higher rates of
up to 96kHz on the horizon.  Since ATRAC-X has adopted a flexible
format, future extensions can include better-than-CD quality and
increases in band width.  Similar to other perceptual audio coding
algorithms, ATRAC-X is based on time/frequency mappings.  However, new
techniques have been incorporated which enable more precise signal
scaling for QoS.

1.2 Overview of ATRAC-X streaming on RTP
The basic building block for ATRAC-X streaming on RTP is the ATRAC-X
"segment".  Each such segment contains the current ATRAC-X encoded
audio data and metadata, as well as any necessary redundant data.
ATRAC-X segments also incorporate a fragmentation mechanism to avoid
excessive packet sizes for one MTU.

Multiple ATRAC-X streams can be transmitted over a single RTP session
by sending multiple segments within each ATRAC-X "slot" -- our
nomenclature for an arbitrary frame of time in which the received audio
data resides.  Figure 1 is a visualization of this concept.

+------0--------1--------2--------3----> ATRAC-X Segment
|   +-----+  +-----+  +-----+  +-----+
0   |  N  |  |  N  |  |  N  |  |  N  | ..
|   +-----+  +-----+  +-----+  +-----+
|   +-----+  +-----+  +-----+  +-----+
1   | N+1 |  | N+1 |  | N+1 |  | N+1 | .. 
|   +-----+  +-----+  +-----+  +-----+
|   +-----+  +-----+  +-----+  +-----+     +-----+
2   | N+2 |  | N+2 |  | N+2 |  | N+2 | ..  |  n  | = ATRAC-X Segment 
|   +-----+  +-----+  +-----+  +-----+     +-----+   with sequence n
|      :        :        :        :
V 
time ("slot")
   Figure 1: ATRAC-X RTP Multiplexed Packetization Streaming Concept

More specific examples of this generalized image can be seen in figures
4 and 5.  This scheme allows for various content distribution methods,
including a substantial number of audio channels.



Hatanaka                                                        [Page 2]


INTERNET-DRAFT    draft-hatanaka-avt-rtp-atracx-03.txt     October  2003



2. Payload Format

2.1 ATRAC-X Full Payload Visualization

The complete structure of an ATRAC-X RTP Payload Format is shown below.

  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |V=2 |P|X|  CC   |M|     PT      |       sequence number        |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                           timestamp                           |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |           synchronization source (SSRC) identifier            |
 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
 |            contributing source (CSRC) identifiers             |
 |                             ....                              |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |Version| FRSEQNO     |ElementID|C|FragNo |NmSeg| TCC |   BCP   |
 |Priority |NF(=2) |RNF(=2)|RNMD(=1) |  Time Stamp Offset        |
 |NMD(=1)  | RSV |  MDID                         |     MDLEN     |
 |   |RSV        |                                               |
 |                        META-DATA(1)                           | 
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |LENGTH                           |RSV        |                 |
 |                     ATRAC-X Main Frame Data(1)                |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |LENGTH                           |RSV        |                 |
 |                     ATRAC-X Main Frame Data(2)                |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |Rd MDID                        |Rd MDID_LEN        |RSV        |
 |                       Redundant META-DATA                     | 
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |LENGTH                           |RSV        |                 |
 |                     ATRAC-X Redudant Frame Data(1)            |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |LENGTH                           |RSV        |                 |
 |                     ATRAC-X Redudant Frame Data(2)            |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                Figure 2: ATRAC-X RTP Payload Format



Hatanaka                                                        [Page 3]


INTERNET-DRAFT    draft-hatanaka-avt-rtp-atracx-03.txt     October  2003




2.2 ATRAC-X Specific Data

The section specific to ATRAC-X is shown below

  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |Version| FRSEQNO     |ElementID|C|FragNo |NmSeg|TCC  |  BCP    |
 |Priority |NF(=N) |RNF(=0)|RNMD(=0) |  Time Stamp Offset        |
 |NMD(=0)  | RSV |LENGTH                     |RSV|               |   
 |                                                               |
 |                 ATRAC-X Main Frame Data(1)                    |
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |LENGTH                           |RSV        |                 |
 |                 ATRAC-X Main Frame Data(2)                    |
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |LENGTH                           |RSV        |                 |
 |                 ............                                  |
 |                                                               |
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |LENGTH                           |RSV        |                 |
 |                 ATRAC-X Main Frame Data(N)                    |
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
             Figure 3: ATRAC-X Main Data format

2.3. Payload Header Description

- Version: Version number (4bit) 
receiver supports the version in the payload header, the transmitted
packets will be parsed and reconstructed; otherwise the packets may be
discarded by the receiver. Receivers may support more than one version
of this protocol if desired.

- FRSEQNO: Frame Sequence Number (7bit)
FRSEQNO denotes the frame sequence number from 0 to 127, and wraps
around accordingly.



Hatanaka                                                        [Page 4]


INTERNET-DRAFT    draft-hatanaka-avt-rtp-atracx-03.txt     October  2003



- ElementID: ATRAC-X bit stream Element ID (5bit)
ElementID identifies each individual ATRAC-X bit stream. One ATRAC-X RTP
session can handle up to 32 ATRAC-X streams simultaneously.  ElementIDs
allow greater content distribution control, such as flexibility in the
number of channels and QoS management.

- NmSeg: Number Of Segment in one ATRAC-X slot (3bit)
This identifier indicates the number of ATRAC-X segments in one ATRAC-X
slot. NmSeg must be identical for all segments within one ATRAC-X slot.

The maximum value of NmSeg is determined at the application level or
negotiated between receiver and sender prior to content transmission 
using the Session Description Protocol(SDP).

- Priority : Priority identifier (5bit) 
This identifier denotes the priority between individual segments
(within the same slot) of the same ElementID.  Lower values denote
higher priority.  Priority values are not absolute but relative to each
other within one ATRAC-X slot.  The value of each priority does not
have to be unique, and it is thus up to the receiver to decide how to
process the segment priorities.

   ____________    ____________      ____________    ____________
  | ATRAC-X    |  | ATRAC-X    |    |  ATRAC-X   |  |  ATRAC-X   |
  |7.1 (12.2)ch|  |5.1 (12.2)ch|    |7.1 (12.2)ch|  |5.1 (12.2)ch|
  |   384kbps  |  |  256kbps   |    |   384kbps  |  |   256kbps  |
  |FRSEQNO:N   |  |FRSEQNO:N   |    |FRSEQNO:N+1 |  |FRSEQNO:N+1 |
  |ElementID:0 |  |ElementID:1 |    |ElementID:0 |  |ElementID:1 |
  |<---------->|  |<---------->|    |<---------->|  |<---------->|
     ATRAC-X         ATRAC-X            ATRAC-X        ATRAC-X
    Segment(1)      Segment(2)         Segment(1)    Segment(2)
  |<---------------------------->|<----------------------------->|
         ATRAC-X Slot  -Nth-           ATRAC-X Slot -N+1th-

     Figure 4: Transmission of more than 7.1ch (12.2ch) ATRAC-X 
        bit streams using two individual streams in one ATRAC-X 
        RTP Payload

Figure 4 shows an example packetization for a 12 channel ATRAC-X bit
stream using two individual streams.  We define "n-th ATRAC-X slot"
as the set of ATRAC-X segments that have identical frame sequence
number n. In this case, each ATRAC-X slot is composed with two ATRAC-X
segments. One of the ATRAC-X segments contains an 384kbps bit stream
for the first 7.1 channels, and the other contains a 256kbps bit stream



Hatanaka                                                        [Page 5]


INTERNET-DRAFT    draft-hatanaka-avt-rtp-atracx-03.txt     October  2003



- NF : Number of ATRAC-X audio Frames (4bit) 
NF denotes the number of ATRAC-X audio frames in one ATRAC-X segment,
with a maximum of 15.  When transmitting metadata only, NF must be set
to 0.

- TCC: Total Channel Configuration (3bit)
TCC denotes the ATRAC-X Channel Configuration information as defined in
Table 1.

A single ATRAC-X stream supports multichannel coding of up to 8
channels through a combination of stereo and monaural channel blocks.
By splitting up the channel information into segments, receivers can
select necessary packets for partial decoding.  Another benefit is the
ability to conceal dropped channel data by using another channel
block's data for decoding.



Hatanaka                                                        [Page 6]


INTERNET-DRAFT    draft-hatanaka-avt-rtp-atracx-03.txt     October  2003



 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |TCC Index| Number of    |  Audio ChannelBlock  | Default block for   |
 |         |   Speakers   |      Groupings       | speaker mapping     |
 |         |              |                      |                     |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |   1     |      1       | mono_channel_block   |     front: center   |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |   2     |      2       | stereo_channel_block | front: left, right  |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |   3     |      3       | stereo_channel_block | front: left, right  |
 |         |              | mono_channel_block   | front: center       |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |   4     |      4       | stereo_channel_block | front: left, right  |
 |         |              | mono_channel_block   | front: center       |
 |         |              | mono_channel_block   | rear: surround      |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |   5     |     5+1      | stereo_channel_block | front: left, right  |
 |         |              | mono_channel_block   | front: center       |
 |         |              | stereo_channel_block | rear: left, right   |
 |         |              | mono_channel_block   |low frequency effects|
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |   6     |     6+1      | stereo_channel_block | front: left, right  |
 |         |              | mono_channel_block   | front: center       |
 |         |              | stereo_channel_block | rear: left, right   |
 |         |              | mono_channel_block   | rear: center        |
 |         |              | mono_channel_block   |low frequency effects|
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |   7     |     7+1      | stereo_channel_block | front: left, right  |
 |         |              | mono_channel_block   | front: center       |
 |         |              | stereo_channel_block | rear: left, right   |
 |         |              | stereo_channel_block | side: left, right   |
 |         |              | mono_channel_block   |low frequency effects|
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	Table 1: Total Channel Configuration Index values
 
- BCP: Block Coupling Pattern (5bits)
BCP indicates which ATRAC-X channel block within a group are contained 
in an ATRAC-X segment. Given the TCC Index value, the Nth bit from the
left indicates the Nth channel block, counting down the list of channel
blocks as defined in column 3 of Table 1.
 If the TCC value is 1 or 2, BCP must be set to "00000".  
The combination of ATRAC-X channel blocks must be chosen from the ones
 listed in the third column of Table 1. Further examples should help
 clarify this terminology.




Hatanaka                                                        [Page 7]


INTERNET-DRAFT    draft-hatanaka-avt-rtp-atracx-03.txt     October  2003



   ___________    ____________    ___________    ___________
  | ATRAC-X   |  | ATRAC-X    |  | ATRAC-X   |  |ATRAC-X    |
  | Front L,R |  |Front Center|  | Rear L,R  |  |LFE        |
  |FRSEQNO:N  |  |FRSEQNO:N   |  |FRSEQNO:N  |  |FRSEQNO:N  |
  |ElementID:0|  |ElementID:0 |  |ElementID:0|  |ElementID:0|
  |Priority:0 |  |Priority:1  |  |Priority:0 |  |Priority:1 |
  |FN:1       |  |FN:1        |  |FN:1       |  |FN:1       |
  |TCC:5      |  |TCC:5       |  |TCC:5      |  |TCC:5      |
  |BCP:10000  |  |BCP:01000   |  |BCP:00100  |  |BCP:00010  |
  |<--------->|  |<---------->|  |<--------> |  |<--------> |
    ATRAC-X        ATRAC-X        ATRAC-X        ATRAC-X
    Segment(1)     Segment(2)     Segment(3)    Segment(4)
  |<------------------------------------------------------->|
                       ATRAC-X Slot  -Nth-
Figure 5: Dividing 5.1 ATRAC-X data into four ATRAC-X Segments

Figure 5 illustrates an example sequence of ATRAC-X segments utilizing
the BCP field.  All segments belong to the same ATRAC-X stream and
therefore have the same ElementID value of 0. As listed in Table 1, a
TCC value of 5 means that the audio data being sent is from a 5.1
multichannel source.  However, in this example, the data is broken up
into four ATRAC-X segments, corresponding to an ATRAC-X channel block
of Front LR, Front Center, Rear LR, and LFE, and with BCP values of
10000, 01000, 00100 and 00010, respectively.  In this case a higher
priority is assigned to FrontL,R and RearL,R.
 
In some cases the data size of the LFE channel block is small, so the
LFE channel data can be combined with another channel block for greater 
transmission efficiency.  Figure 6 illustrates an example of combining
channel blocks into a segment.  In this case, the BmCP value of each
ATRAC-X segment is set as follows:
Front LR block: BCP = 10000 
Front Center block: BCP = 01000 
Rear LR + LFE block: BCP = 00110




Hatanaka                                                        [Page 8]


INTERNET-DRAFT    draft-hatanaka-avt-rtp-atracx-03.txt     October  2003



   ___________    ____________    _____________  
  | ATRAC-X   |  | ATRAC-X    |  | ATRAC-X     | 
  | Front L,R |  |Front Center|  |Rear L,R+LFE | 
  |FRSEQNO:N  |  |FRSEQNO:N   |  |FRSEQNO:N    | 
  |ElementID:0|  |ElementID:0 |  |ElementID:0  | 
  |Priority:0 |  |Priority:1  |  |Priority:0   | 
  |FN:1       |  |FN:1        |  |FN:1         | 
  |TCC:5      |  |TCC:5       |  |TCC:5        | 
  |BCP:10000  |  |BCP:01000   |  |BCP:00110    | 
  |<--------->|  |<---------->|  |<----------->| 
    ATRAC-X        ATRAC-X          ATRAC-X     
  |<------------------------------------------>|
                 ATRAC-X Slot  -Nth-
                     Figure 6: 
 Combining rear LR and LFE channel blocks into an ATRAC-X Segment
 
The ATRAC-X RTP payload format is capable of sending a mixture of
divided and non-divided ATRAC-X streams. Figure 7 illustrates an
example of sending divided and non-divided streams.
   ___________    ____________    ____________    ___________
  | ATRAC-X(1)|  | ATRAC-X(1) |  | ATRAC-X(1) |  |ATRAC-X(2) |
  | Front L,R |  |Front Center|  |Rear L,R+LFE|  | Front L,R |
  |FRSEQNO:N  |  |FRSEQNO:N   |  |FRSEQNO:N   |  |FRSEQNO:N  |
  |ElementID:0|  |ElementID:0 |  |ElementID:0 |  |ElementID:1|
  |Priority:0 |  |Priority:1  |  |Priority:0  |  |Priority:0 |
  |FN:1       |  |FN:1        |  |FN:1        |  |FN:1       |
  |TCC:5      |  |TCC:5       |  |TCC:5       |  |TCC:2      |
  |BCP:10000  |  |BCP:01000   |  |BCP:00110   |  |BCP:00000  |
  |<--------->|  |<---------->|  |<---------->|  |<--------> |
    ATRAC-X        ATRAC-X        ATRAC-X          ATRAC-X
    Segment(1)     Segment(2)     Segment(3)      Segment(4)
  |<-------------------------------------------------------->|
                       ATRAC-X Slot  -Nth-
 Figure 7: Sending mixture of divided and non-divided ATRAC-X stream

ATRAC-X Segments (1) through (3) are a divided 5.1 channel stream, and 
segment (4) is a non-divided stereo stream. (Note the BCP for 
segment (4) must be "00000".)

- LENGTH: Length of ATRAC-X data (17bit)
The bit size of each ATRAC-X frame in an ATRAC-X segment is placed 
in LENGTH.
But actual frame data will be filled with adequate number of 0 for byte
 allignment, and these 0 data will be igonored when using the alligned
 frame data.


Hatanaka                                                        [Page 9]


INTERNET-DRAFT    draft-hatanaka-avt-rtp-atracx-03.txt     October  2003



3. QoS Consideration
Realtime bit-rate control is a natural step in the implementation of
Quality of Service (QoS).  The ATRAC-X payload format allows for this
control through the NmSeg parameter.  

Figure 8 below illustrates the NmSeg value changing while transmitting
a 12 channel ATRAC-X bit stream.  At first, two ATRAC-X segments
containing the first 7.1ch and remaining 5.1ch, respectively, are
transmitted in the Nth ATRAC-X slot.  It is then reduced to transmit
only one segment in the (N+1)th and following ATRAC-X frame sequences
by omitting transmission of the latter 5.1ch stream.

   ____________    ____________    ____________    ____________
  | ATRAC-X    |  | ATRAC-X    |  |  ATRAC-X   |  |  ATRAC-X   |
  |7.1 (12.2)ch|  |5.1 (12.2)ch|  |7.1 (12.2)ch|  |7.1 (12.2)ch|
  |   384kbps  |  |   256kbps  |  |   384kbps  |  |   384kbps  |
  |FRSEQNO:N   |  |FRSEQNO:N   |  |FRSEQNO:N+1 |  |FRSEQNO:N+2 |
  |ElementID:0 |  |ElementID:1 |  |ElementID:0 |  |ElementID:0 |
  |NmSeg:2     |  |NmSeg:2     |  |NmSeg:1     |  |NmSeg:1     |
  |<---------->|  |<---------->|  |<---------->|  |<---------->|
     ATRAC-X        ATRAC-X         ATRAC-X          ATRAC-X
     Segment(1)     Segment(2)     Segment(1)      Segment(1)
  |<-------------------------->|<-------------->|<------------>|
          ATRAC-X Slot            ATRAC-X Slot    ATRAC-X Slot
              -Nth-                 -N+1th-         -N+2th-
 Figure 8:  Bit-rate control by omitting secondary 5.1ch data

As another example, Figure 9 below illustrates an example of the NmSeg
field changing while transmitting a 5.1ch ATRAC-X bit stream.
Initially, two ATRAC-X segments comprising all channel blocks of a
5.1ch stream are transmitted in the Nth ATRAC-X slot.  Then it is
reduced to transmit only FrontL,R and Center channel blocks in the
(N+1)th and following ATRAC-X slots by omitting the transmission of
RearL,R and LFE channel blocks.




Hatanaka                                                        [Page 10]


INTERNET-DRAFT    draft-hatanaka-avt-rtp-atracx-03.txt     October  2003



   ___________    ____________    ____________    ____________
  | ATRAC-X   |  | ATRAC-X    |  | ATRAC-X    |  |ATRAC-X     |
  |FrontL,R   |  |RearL,R+LFE |  |Front L,R   |  | Front L,R  |
  | +Center   |  |            |  | + Center   |  |  +Center   |
  |FRSEQNO:N  |  |FRSEQNO:N   |  |FRSEQNO:N+1 |  |FRSEQNO:N+2 |
  |ElementID:0|  |ElementID:0 |  |ElementID:0 |  |ElementID:0 |
  |NmSeg:2    |  |NmSeg:2     |  |NmSeg:1     |  |NmSeg:1     |
  |TCC:5      |  |TCC:5       |  |TCC:5       |  |TCC:5       |
  |BCP:11000  |  |BCP:00110   |  |BCP:11000   |  |BCP:11000   |
  |<--------->|  |<---------->|  |<---------->|  |<---------->|
    ATRAC-X        ATRAC-X         ATRAC-X          ATRAC-X
    Segment(1)     Segment(2)     Segment(1)      Segment(1)
  |<-------------------------->|<-------------->|<------------>|
          ATRAC-X Slot            ATRAC-X Slot    ATRAC-X Slot
              -Nth-                 -N+1th-         -N+2th-
 Figure 9:  Bit-rate control by omitting some channel blocks


4. Metadata
The ATRAC-X RTP payload provides support for the inclusion of metadata.
Metadata can be used for controlling the playback of ATRAC-X data as it 
is streamed in real-time, or simply as supplemental information.  
Example uses include downmix parameters, speaker configuration settings,
and effects such as panning, fading, etc.  The receiver may handle all 
or part of the metadata segments, which are each classified by a unique 
ID.  The following information must be defined in the ATRAC-X RTP 
payload header when referring to metadata.

- NMD: Number of Metadata Frames(5bit) 
Number of metadata frames included in the RTP packet

- MDID: MetaData ID (16bit) 
A unique ID which indicates the metadata type associated with this
frame.  Although unique, there are two ID types.  The first type of
identifier is globally pre-define for specific metadata types, while
the other identifier type is for session specific use, as generated and
negotiated between transmitter and receiver dynamically prior to the
streaming session. The two types are distinguished by the MSB of the
otherwise the ID is a session specific one.  Thus, 32767 kinds of
metadata will be available for each type of identifier.  Currently all
globally pre-defined identifiers are reserved and prohibited.Definition
of the negotiation method between transmitter and receiver is outside
the scope of this document.




Hatanaka                                                        [Page 11]


INTERNET-DRAFT    draft-hatanaka-avt-rtp-atracx-03.txt     October  2003




- MDLEN: MetaData LENgth (10bit)
The byte size of the metadata corresponding to the above metadata ID.


  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |MDID(=N)                       |MDLEN              |RSV        |
 |                                                               |
 |                                                               |
 |                      META-DATA (N)                            |
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |MDID(=N+1)                     |MDLEN              |RSV        |
 |                                                               |
 |                      META-DATA (N+1)                          |
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                    Figure 9: Metadata segment

5. Redundant data for robustness
Redundant data can be included in the ATRAC-X RTP payload in order to 
recover from errors due to packet loss. ATRAC-X audio frames from 
previous ATRAC-X slots are re-sent as redundant audio data.  Metadata 
can also be re-sent as redundant data. Existence of redundant data in 
the payload is not mandatory.

When transmitting redundant data, the following information must be
defined in the ATRAC-X RTP payload header:

- TimeStampOffset : Time Stamp Offset for redundant data (14bit)
An unsigned timestamp offset for this ATRAC-X segment relative to the
timestamp given in the RTP header.  The use of an unsigned offset
implies that redundant data is sent after the original data.  Thus,
TimeStampOffset is subtracted from the current timestamp to determine
the timestamp of the redundant data. 

- RNF : The number of redundant ATRAC-X audio frames(4bit)

- RNMD : The number of redundant ATRAC-X metadata frames(5 bit)




Hatanaka                                                        [Page 12]


INTERNET-DRAFT    draft-hatanaka-avt-rtp-atracx-03.txt     October  2003



     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2  
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |    TimeStampOffset        |RNF    |RNMD     |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       Figure 10: Control bit field for redundant data

The following 2 figures show hypothetical ATRAC-X packets at previous 
and current time frames when sending redundant data.

  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |Version| FRSEQNO     |ElementID|C|FragNo |NmSeg|TCC  | BCP     |
 |Priority |NF(=3) |RNF(=3)|RNMD(=0) |Time Stamp Offset          |
 |NMD(=0)  | RSV |Length                           |RSV          |         
 |                 ATRAC-X Main Frame Data (N th Frame)          |
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |LENGTH                           |RSV        |                 |
 |                 ATRAC-X Main Frame Data (N+1 th Frame)        |
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |LENGTH                           |RSV        |                 |
 |                 ATRAC-X Main Frame Data (N+2 th Frame)        |
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |LENGTH                           |RSV        |                 |
 |          ATRAC-X Redundant Frame Data (M th Frame)            |
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |LENGTH                           |RSV        |                 |
 |          ATRAC-X Redundant Frame Data (M+1 th Frame)          |
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |LENGTH                           |RSV        |                 |
 |          ATRAC-X Redundant Frame Data (M+2 th Frame)          |
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  M is the ATRAC-X frame number corresponding to the time
  which is calculated by (RTP TimeStamp - TimeStampOffset).

         Figure 11: An example with redundant data 




Hatanaka                                                        [Page 13]


INTERNET-DRAFT    draft-hatanaka-avt-rtp-atracx-03.txt     October  2003




  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |Version| FRSEQNO     |ElementID|C|FragNo |NmSeg|TCC  |  BCP    |
 |Priority |NF(=3) |RNF(=3)|RNMD(=1) |Time Stamp Offset          |
 |NMD(=1)  | RSV |  MDID                         |     MDLEN     |
 |   |RSV        |                                               |
 |                         META-DATA                             | 
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |LENGTH                           |RSV        |                 |
 |                 ATRAC-X Main Frame Data (N th Frame)          |
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |LENGTH                           |RSV        |                 |
 |                 ATRAC-X Main Frame Data (N+1 th Frame)        |
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |LENGTH                           |RSV        |                 |
 |                 ATRAC-X Main Frame Data (N+2 th Frame)        |
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |MDID                           |MDLEN              |RSV        |
 |                                                               |
 |                   Redundant META DATA                         |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |LENGTH                           |RSV        |                 |
 |          ATRAC-X Redundant Frame Data (M th Frame)            |
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |LENGTH                           |RSV        |                 |
 |          ATRAC-X Redundant Frame Data (M+1 th Frame)          |
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |LENGTH                           |RSV        |                 |
 |          ATRAC-X Redundant Frame Data (M+2 th Frame)          |
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  M is the ATRAC-X frame number corresponding to the time
  which is calculated by (RTP TimeStamp - TimeStampOffset).

         Figure 12: An example with redundant data and
                    additional metadata 



Hatanaka                                                        [Page 14]


INTERNET-DRAFT    draft-hatanaka-avt-rtp-atracx-03.txt     October  2003



 
6. Fragmentation
In the event that ATRAC-X frame data, metadata and/or redundant data 
are too large to be packetized into one RTP packet, transmissions of 
one ATRAC-X segment can be fragmented into sub-segments.

                          0 1 2 3 4 5 6 7
                         +-+-+-+-+-+-+-+-+
                         |C|FragNo |RSV  |
                         +-+-+-+-+-+-+-+-+
             Figure 13: Control bit field for fragmentation

- C: Continuous flag (1bit) 
Continuous flag indicates that succeeding parts of the data in the
current packet exists in following packets, and a value of 0 
denotes the data is complete in the current packet.

- FragNo: Fragmentation Number (4bit)
The sequence number for each packet in the fragmentation. Up to 15 
fragmentations are supported.  Metadata can exist only in the first 
fragmented packet (FragNo = 0) to avoid conflicts in fragmentation.

   ___________    ____________    ____________    ____________
  | Front L,R |  |Front Center|  |Rear L,R+LFE|  |Rear L,R+LFE|
  |           |  |            |  |            |  |Fragmented  |
  |FRSEQNO:N  |  |FRSEQNO:N   |  |FRSEQNO:N   |  |FRSEQNO:N   |
  |ElementID:0|  |ElementID:0 |  |ElementID:0 |  |ElementID:0 |
  |Priority:0 |  |Priority:1  |  |Priority:0  |  |Priority:0  |
  |NF:1       |  |NF:1        |  |NF:1        |  |NF:1        |
  |TCC:5      |  |TCC:5       |  |TCC:5       |  |TCC:5       |
  |BCP:10000  |  |BCP:01000   |  |BCP:00110   |  |BCP:00110   |
  |C:0        |  |C:0         |  |C:1         |  |C:0         |
  |FragNo:0   |  |FragNo:0    |  |FragNo:0    |  |FragNo:1    |
  |<--------->|  |<---------->|  |<-------------------------->|
    ATRAC-X        ATRAC-X                  ATRAC-X
    Segment(1)     Segment(2)              Segment(3)
  |<--------------------------------------------------------->|
                       ATRAC-X Slot  -Nth-
 Figure 14: An example of fragmentation in 5.1ch ATRAC-X Segment(3)




Hatanaka                                                        [Page 15]


INTERNET-DRAFT    draft-hatanaka-avt-rtp-atracx-03.txt     October  2003



7. RTP Standard Header
The RTP standard header timestamp is the presentation time of the first
ATRAC-X frame data in a segment, and is described as the PCM sample
 number from the contents' beginning.

The initial value for timestamp is arbitrary, but a random number is
preferable.  The "Marker bit" is set to 1 for the last packet
in each ATRAC-X Slot, 0 otherwise.

Remarks:
The sampling frequency of all ATRAC-X bit streams included in one
ATRAC-X RTP payload format must be indentical to avoid time stamp
conflicts.


8. Multicasting Consideration
This payload can be used for unicast and multicast session system.
But currently in case of multicasting, the QoS compensating functions
 which are described in section 3 "QoS Consideration" should be
 disable in order to avoid the confliction of packet handling in
 multicasting transmission.


9. MIME parameters and SDP mapping for ATRAC-X
In order to clarify MIME type registration and SDP usage, MIME parameters
are now under consideration.
The parameters will be similar to ones of other audio codec, but some 
more MIME parameters may be incorporated in order to realize the
 flexibility of ATRAC-X RTP Payload format.


10. Consideration on error robustness
This payload format has redundant data area for error robustness
as described in section 5. But sometimes it is difficult to determine 
optimum offset for redundant data.
The other way to compensate the error(packet loss) is to use repeat of
transmission according to the request from receiver side (ex. ARQ technique).

(We are planning to have a short demonstration of ATRAC-X streaming using
 newly developed "Real Time Automatic Repeat reQuest" technique in the
 58th Minneapolis meeting.) 

Hatanaka                                                        [Page 16]


INTERNET-DRAFT    draft-hatanaka-avt-rtp-atracx-03.txt     October  2003



10. Glossary

(1) ATRAC-X Audio Frame : The smallest unit of ATRAC-X data.  This is 
equivalent to 2048 PCM samples (as defined in the ATRAC-X specification
).

(2) ATRAC-X Channel Block : A unit representing how audio signals are 
contained.  Two types of channel blocks exist: the
"mono_channel_block", which represents a one monaural channel, and the
"stereo_channel_block" which represents one pair of stereo channels.
Constructing one complete bit stream which contains more than two
channels is realized by a combination of the two types of channel
blocks. Possible combinations are defined in Table 1.
  
(3) ATRAC-X Segment : A unit of ATRAC-X data that is sent inside an RTP
packet.  A segment consists of any combination of audio frames,
metadata frames, redundant metadata frames, and redundant audio frames.

(4) ATRAC-X Slot: A unit of time within which all audio frames of an 
ATRAC-X segment belong.  For example, in Figure 4, two segments make up 
the Nth ATRAC-X slot.  However, because these two segments are from 
would play in the same amount of time.  As another example, in Figure 5,
 four segments make up the Nth ATRAC-X slot.  However, because decoded 
audio samples from each segment would all play at the same time, they 
are in the same slot.


Hatanaka                                                        [Page 17]


INTERNET-DRAFT    draft-hatanaka-avt-rtp-atracx-03.txt     October  2003



11. Security Considerations

 RTP packets using the payload format defined in this specification
are subject to the security considerations discussed in the RTP
specification [1].  This implies that confidentiality of the media
streams is achieved by encryption.  Because the data compression used
with this payload format is applied end-to-end, encryption may be
performed on the compressed data so there is no conflict between the
two operations.

12. References

[1] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson "RTP: A
Transport Protocol for Real Time Applications", RFC 1889, January 1996.

12. Author's Address

   Mitsuyuki Hatanaka
   Sony Corporation
   6-7-35 Kitashinagawa Shinagawa-ku
   Tokyo,Japan

   EMail: hatanaka@av.crl.sony.co.jp


   Jun Matsumoto
   Sony Corporation
   6-7-35 Kitashinagawa Shinagawa-ku
   Tokyo,Japan

   EMail: jun@av.crl.sony.co.jp


   Matthew Romaine
   Sony Corporation
   6-7-35 Kitashinagawa Shinagawa-ku
   Tokyo,Japan

   EMail: Matthew.Romaine@jp.sony.com


Hatanaka                                               [Page 18]