Internet DRAFT - draft-christ-rtsp-mpeg4

draft-christ-rtsp-mpeg4









Internet Engineering Task Force
INTERNET-DRAFT                       P. Christ, Ch. Guillemot, S. Wesner
draft-christ-rtsp-mpeg4-00.txt              Univ. Stuttgart - RUS/ INRIA
                                                       November 16, 1998
                                                   Expires: May 15, 1999





                  RTSP-based Stream Control in MPEG-4




Status of this Memo

This document is an Internet-Draft.  Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, and
its working groups.  Note that other groups may also distribute working
documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months.
Internet-Drafts may be updated, replaced, or obsoleted by other
documents at any time.  It is not appropriate to use Internet-Drafts as
reference material or to cite them other than as a "working draft" or
"work in progress."

To learn the current status of any Internet-Draft, please check the
1id-abstracts.txt listing contained in the Internet-Drafts Shadow
Directories on ftp.ietf.org, nic.nordu.net, ftp.nisc.sri.com, or
munnari.oz.au.


                                ABSTRACT


      In order to support advanced interactivity as envisaged for
      MPEG-4 applications, this document proposes a simple  RTSP-
      based [1] streams control framework - including necessary
      extensions to RTP methods syntax and semantics.  Reflecting
      syntax and semantics of the MPEG-4 BIFS scene description
      [2], VRML nodes  [4] and the MPEG-4 media delivery framework
      (DMIF)[3], in the spirit of HTMLSMIL [5], Random Access
      Point information, Range and Time parameters are introduced
      into the relevant URL(s) and related signaling methods
      accordingly.  Two additional optional methods, R-MUTE, for
      Remote-MUTE, and RESUME are also proposed.



Christ/Guillemot/Wesner    November 16, 1998                    [Page 1]

INTERNET-DRAFT    draft-ietf-christ-rtsp-mpeg4-00.txt  November 16, 1998


1.  Motivations and Rationale

The motivations and rationale of the document is the derivation of a
simple streams control framework for MPEG-4  - reflecting the syntax and
semantics of the MPEG-4 scene nodes - that could possibly be introduced
as extended methods syntax and semantics, in line with RTSP extension
guidelines [1-section 1.5].

The attribute 'simple' alludes to the aim to provide for the minimum in
support of  an MPEG-4  client-to-server application  equivalent to
HTML/CGI or SMIL.

It is expected that in the evolution of MPEG-4 towards Java-enhanced
multi-user-environments there will be in parallel other and possibly
more elaborated application signaling frameworks [7].

With respect to RTSP, the main technical issues concern first a possible
support for dissociation of connection management and stream control via
the optional provision of a channel identifier, as an alternative to the
URL in the methods. The second main issue concerns the provision of
syntactical means for extended timing, random access point and range
parameters.  In order to support user-navigation based interactivity two
additional optional methods, called R-MUTE, for Remote-MUTE, inspired
from the local MUTE specified in PREMO [6], and RESUME are also
proposed.


With respect to MPEG-4 Systems and DMIF the whole proposal will be fed
into the ongoing Version 2 procedure. It should also be mentioned that
this proposal  is orthogonal to [8] which tries to position  RTSP-based
signaling into the DMIF environment.


2.  Preliminary Remarks


MPEG-4 'BIFS'-scene description framework is inspired from VRML. In
VRML, application specific procedural logic and state management can be
implemented via Script nodes - which will be - together with Prototypes
- provided only in MPEG-4 Version 2

An MPEG-4 client-server application scenario - of the type this draft is
aiming at - can be characterized as follows: from the terminal side and
driven by events from user interaction with the scene,   a  generic
MPEG-4 Browser via Application Signaling requires from the Server MPEG-4
compliant streams of scene descriptions - constituting  the very
application - and their companions e.g. Audio-Video streams.




Christ/Guillemot/Wesner    November 16, 1998                    [Page 2]

INTERNET-DRAFT    draft-ietf-christ-rtsp-mpeg4-00.txt  November 16, 1998


MPEG-4 Version 2  will introduce  an advanced Interactivity Model
(MPEG-J). This should lead to application specific procedural code at
the terminal side, allowing e.g. for the local construction/
encoding(/decoding) of BIFS updates. As the script code probably  would
be read as part of the scene description, the Browser could (probably)
remain generic, i.e. independent of any specific application.


The signaling syntax and semantics, as discussed below, is independent
of the signaling supporting mechanisms, i.e. of the procedural logic
mechanisms. These mechanisms are out of the scope of this document.

3.  Relating Application Signaling to MPEG-4 Scene Description

In VRML/MPEG-4, syntax and semantics of the nodes of a scene determine
and confine the characteristics of interactivity possible. This is true
for the parameters available both  in 'media-playing nodes' such as
MovieTexture,  and in 'structure related' nodes such as Inline.

Even with future Proto and Script nodes in MPEG-4 Version 2, the
expressivity of signaling, e.g. with respect to media streams,  will be
confined by such of the corresponding nodes in the scene.

Hence, in MPEG-4, as  shortly  indicated in the introduction of this
document, all interactivity and in turn all application signaling  has
to be constructed in accordance with syntax and semantics of the
relevant nodes.

3.1.  Media Content Playing Nodes

An object may be completely described within the scene description
(BIFS) information, or may also require elementary stream data from one
or more audio - visual objects, via the 'media content playing' nodes.
Therefore, interactivity and corresponding signaling  with respect to
media objects has to be derived from the 'media content playing' nodes
such as VideoObject2D, MovieTexture, etc. An application signaling
method in that context would typically carry a PLAY, a PAUSE or a
TEARDOWN method.

3.2.  Structure related nodes

Interactivity and application signaling concerning the structure of the
scene, e.g. changes of a scene, will be derived from 'structure related'
nodes such as Inline2D and Inline.

3.3.  Usage of URL's

URL as parameter of type MFString, indicating the location of the media



Christ/Guillemot/Wesner    November 16, 1998                    [Page 3]

INTERNET-DRAFT    draft-ietf-christ-rtsp-mpeg4-00.txt  November 16, 1998


stream, or including a reference to an ObjectDescriptor (OD). An OD is a
level of indirection that can point either to another object descriptor
or to ES_descriptors that in turn provide the references to locations of
raw elementary streams associated with the node, via URL fields defined
as a string of 8 bits characters (type bit(8)). Interpreted by the
Browser these URL(s) will lead to the issuing of the signaling commands.

3.4.  MPEG-4 Timing Model

A point in time at which an event occurs (change of a parameter value,
defining the start or stop of a media stream, etc.) is identified by the
SFTime fields of the media content playing nodes.  The SFTime fields
indicate in general a time relative to the BIFS time base that applies
to the BIFS Elementary Stream that has conveyed the scene description.

The format of  the SFTime field is  64-bit double-precision floating
point numbers (in ISO C floating point format ) indicating a duration in
seconds with respect to a reference point in time. This corresponds to
an NPT - Normal Play Time - in RTSP terms, except that, here, the
reference point in time (beginning of the presentation) is not expressed
in GMT time but provided by the StartCompositionTimeStamp of the scene
description stream.

SFTime fields of some nodes may require 'absolute' time values, given by
a "wall clock" time . The relation of the BIFS time base ticks, i.e. CTS
- composition time stamp - of the BIFS Access Unit  that has conveyed
the respective scene description (BIFS) node,  to the wall clock can be
resolved, if the wall clock time is known from the receiver.   This is
achieved by an optional wallClockTimeStamp.

4.  Application Signaling and the  MPEG-4 media delivery framework DMIF

An MPEG4 application identifies a particular elementary stream through
its Elementary Stream Id (ESid), scoped by the service session it
belongs to.

When using DMIF (MPEG-4 Delivery Multimedia Framework), a 1-to-1
correspondence between each ESid and a channelHandle (chId) is realized
by the DMIF layer. The stream identified by the ESid is  further
referred to through its channelHandle.

Dissociating connection management from stream control implies methods
syntactic extensions, namely possible stream identification by a
different syntactical mean other than the URL (see section 6.3). The URL
will be used only for the connection management.






Christ/Guillemot/Wesner    November 16, 1998                    [Page 4]

INTERNET-DRAFT    draft-ietf-christ-rtsp-mpeg4-00.txt  November 16, 1998


5.  Extended RTSP methods syntax and semantics


5.1.  NPT extension

The RTSP NPT format, consisting of a decimal fraction expressed in
either seconds or hours, minutes, and seconds, can then be used,




     npt-time     =   "now" | npt-sec | npt-hhmmss
     npt-sec      =   1*DIGIT [ "." *DIGIT ]
     npt-hhmmss   =   npt-hh ":" npt-mm ":" npt-ss [ "." *DIGIT ]
     npt-hh       =   1*DIGIT     ; any positive number
     npt-mm       =   1*2DIGIT    ; 0-59
     npt-ss       =   1*2DIGIT    ; 0-59

However, it is necessary to provide the possibility for having the
reference point in time set to the value of the
startCompositionTimeStamp of the corresponding BIFS scene description
stream instead of  0.0 seconds. This would mean that the beginning of
the presentation is at time startCompositionTimeStamp of the
corresponding BIFS scene description stream. Hence, the NPT syntax can
be complemented by an optional field the wall clock time base. If this
field is not present, then the default value for the reference point in
time is 0.0 seconds.



     npt-ref      =   npt-hh ":" npt-mm ":" npt-ss [ "." *DIGIT ]
     npt-hh       =   1*DIGIT     ; any positive number
     npt-mm       =   1*2DIGIT    ; 0-59
     npt-ss       =   1*2DIGIT    ; 0-59



5.2.  Random Access Point (RAP) and Range extensions


Method and RAP or range information could be stored as parameters in the
MFUrl class defined below in a preliminary syntax:









Christ/Guillemot/Wesner    November 16, 1998                    [Page 5]

INTERNET-DRAFT    draft-ietf-christ-rtsp-mpeg4-00.txt  November 16, 1998



 class MFUrl
 {
  if (isMethod)
   SFString GlobalMethod=method;
  else
  {
   if (isOD)
   {
     bit(10) ODId;
     MFString ESId=esid ESMethod=method ESR1=ES_RAP_info1 ESR2=ES_RAP_info2;
   }
   else
   {
     SFString urlValue ODId=odid;
     MFString ESId=esid ESMethod=method ESR1=ES_RAP_info1 ESR2=ES_RAP_info2;
   }
  }
 }

The GlobalMethod is introduced to allow for dealing with streams of all
BIFS/nodes belonging to the group or whole scene.

Note that the MPEG-4 system does not, so far, provide semantical means
for random access point and range information. The above class is a
proposal that the authors are submiting in parallel to MPEG-4.

It is proposed here  to complement the relative time and range syntax in
the RTSP methods by possibly other range specifiers - also including the
case of degenerated ranges specifying just a single Random Acces Point.
Exept for the degenerated NodeID case, the 'other' ranges are still
under consideration. In any case, the syntax would  be:


      other_range     = other_RAP_info#1 - other_RAP_info#2
      other_rap_info  = NodeId | ...
      range-specifier = npt-range |  other_range


5.3.  Methods Extended Syntax

In addition to Random Access Point, Range and Time parameters, and in
order to allow for dissociation of connection management and stream
control, an additional syntactical mean - other than the URL mechanism -
for identifying a stream must be supported.




Christ/Guillemot/Wesner    November 16, 1998                    [Page 6]

INTERNET-DRAFT    draft-ietf-christ-rtsp-mpeg4-00.txt  November 16, 1998


This syntactical mechanism could be:


 Loop(ch_identifier)
 e.g. PLAY Loop(23,56,32)


5.4.  Additional Optional Methods

The two additional optional methods proposed here find a strong interest
in environments with interactivity triggered by user navigation in the
presented scenes (e.g. Virtual Reality in the VRML / MPEG-4 spirit).  In
a scene with several synchronized audio-visual streams, moving away from
one audio-visual stream could allow to suspend the delivery of the
stream, and coming back closer to it could resume the delivery of the
stream, at a point that would be synchronized with all the other streams
that have been maintained in the scene.

5.4.1.  R-MUTE  (Remote-Mute)

The R-MUTE method is inspired from the MUTE method specified by PREMO
[6].  However, in PREMO, the MUTE command suspends the presentation of
the streams on the terminal but does not suspend the delivery of the
streams.

The R-MUTE method would cause the stream delivery to be suspended
temporarily, but a 'local' progression - on the server side - on the
streams, with maintained synchronization actions, occurs without
delivery of the streams. The server will hence maintain the current
reading points of the on-going streams, and will then be able to resume
the delivery at the corresponding radom access point, when triggered by
the RESUME method.

5.4.2.  RESUME

The RESUME method causes the re-start of the delivery of the stream,
that has been previously suspended by the R-MUTE method. The delivery
will be resumed at the random access point given by the server state
machine, which is also dependent of the stream time base and of the time
interval between the R-MUTE and the RESUME commands.

Remark: The above functionalities supposes the provision in the scene
description syntax and semantic of mechanisms for routing the whole
semantic of the user navigation triggered action to the media content
playing nodes.






Christ/Guillemot/Wesner    November 16, 1998                    [Page 7]

INTERNET-DRAFT    draft-ietf-christ-rtsp-mpeg4-00.txt  November 16, 1998


6.  Authors Addresses

 Paul Christ
 Computer Center - RUS
 University of Stuttgart
 Allmandring 30
 D70550 Stuttgart, Germany.
 email: Paul.Christ@rus.uni-stuttgart.de

 Christine Guillemot
 INRIA
 Campus Universitaire de Beaulieu
 35042 RENNES Cedex, FRANCE
 email: Christine.Guillemot@irisa.fr

 Stefan Wesner
 Computer Center - RUS
 University of Stuttgart
 Allmandring 3a
 D70550 Stuttgart, Germany.
 email: Stefan.Wesner@rus.uni-stuttgart.de






























Christ/Guillemot/Wesner    November 16, 1998                    [Page 8]

INTERNET-DRAFT    draft-ietf-christ-rtsp-mpeg4-00.txt  November 16, 1998



7.  References



        [1]   H. Schulzrinne, A. Rao,  R.  Lanphier,  'RTSP:  Real  Time
              Streaming Protocol', RFC 2326, April 1998.

        [2]   Information Technology - Coding of Audiovisual  Objects  -
              Part  1:   Systems,  ISO/IEC  FCD 14496-1 [DRAFT], May-15,
              1998.

        [3]   Information technology - Generic coding of moving pictures
              and  associated  audio  information  -  Part  6:  Delivery
              Multimedia Integration Framework', ISO/IEC  14496-6,  May-
              15, 1998.

        [4]   VRML  97:   The   Virtual   Reality   Modeling   Language,
              International Standard ISO/IEC 14772-1:1997

        [5]   Synchronized Multimedia Integration  Language  (SMIL)  1.0
              Specification, W3C proposed recommendation, April-9, 1998.

        [6]   Information Processing Systems  -  Computer  Graphics  and
              Image   Processing   -   Presentation   Environments   for
              Multimedia Objects (PREMO),  Part  3:  Multimedia  Systems
              Services, ISO/IEC 14478-3.

        [7]   ISO/IEC/JTC1/SC29/WG11:  w/N2359  subpart  2  Verification
              Model of Advanced BIFS (Systems VM subpart 2) . July 98

        [8]   ISO/IEC/JTC1/SC29/WG11:    MPEG98/M4102,   October   1998;
              containing:    draft-balabanian-rtsp-mpeg4-dmif-00.txt   .
              Sept. 22, 1998:  Balabanian: The Role of  DMIF  with  RTSP
              and MPEG-4
















Christ/Guillemot/Wesner    November 16, 1998                    [Page 9]