Internet DRAFT - draft-christ-rtsp-mpeg4
draft-christ-rtsp-mpeg4
Internet Engineering Task Force
INTERNET-DRAFT P. Christ, Ch. Guillemot, S. Wesner
draft-christ-rtsp-mpeg4-00.txt Univ. Stuttgart - RUS/ INRIA
November 16, 1998
Expires: May 15, 1999
RTSP-based Stream Control in MPEG-4
Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, and
its working groups. Note that other groups may also distribute working
documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months.
Internet-Drafts may be updated, replaced, or obsoleted by other
documents at any time. It is not appropriate to use Internet-Drafts as
reference material or to cite them other than as a "working draft" or
"work in progress."
To learn the current status of any Internet-Draft, please check the
1id-abstracts.txt listing contained in the Internet-Drafts Shadow
Directories on ftp.ietf.org, nic.nordu.net, ftp.nisc.sri.com, or
munnari.oz.au.
ABSTRACT
In order to support advanced interactivity as envisaged for
MPEG-4 applications, this document proposes a simple RTSP-
based [1] streams control framework - including necessary
extensions to RTP methods syntax and semantics. Reflecting
syntax and semantics of the MPEG-4 BIFS scene description
[2], VRML nodes [4] and the MPEG-4 media delivery framework
(DMIF)[3], in the spirit of HTMLSMIL [5], Random Access
Point information, Range and Time parameters are introduced
into the relevant URL(s) and related signaling methods
accordingly. Two additional optional methods, R-MUTE, for
Remote-MUTE, and RESUME are also proposed.
Christ/Guillemot/Wesner November 16, 1998 [Page 1]
INTERNET-DRAFT draft-ietf-christ-rtsp-mpeg4-00.txt November 16, 1998
1. Motivations and Rationale
The motivations and rationale of the document is the derivation of a
simple streams control framework for MPEG-4 - reflecting the syntax and
semantics of the MPEG-4 scene nodes - that could possibly be introduced
as extended methods syntax and semantics, in line with RTSP extension
guidelines [1-section 1.5].
The attribute 'simple' alludes to the aim to provide for the minimum in
support of an MPEG-4 client-to-server application equivalent to
HTML/CGI or SMIL.
It is expected that in the evolution of MPEG-4 towards Java-enhanced
multi-user-environments there will be in parallel other and possibly
more elaborated application signaling frameworks [7].
With respect to RTSP, the main technical issues concern first a possible
support for dissociation of connection management and stream control via
the optional provision of a channel identifier, as an alternative to the
URL in the methods. The second main issue concerns the provision of
syntactical means for extended timing, random access point and range
parameters. In order to support user-navigation based interactivity two
additional optional methods, called R-MUTE, for Remote-MUTE, inspired
from the local MUTE specified in PREMO [6], and RESUME are also
proposed.
With respect to MPEG-4 Systems and DMIF the whole proposal will be fed
into the ongoing Version 2 procedure. It should also be mentioned that
this proposal is orthogonal to [8] which tries to position RTSP-based
signaling into the DMIF environment.
2. Preliminary Remarks
MPEG-4 'BIFS'-scene description framework is inspired from VRML. In
VRML, application specific procedural logic and state management can be
implemented via Script nodes - which will be - together with Prototypes
- provided only in MPEG-4 Version 2
An MPEG-4 client-server application scenario - of the type this draft is
aiming at - can be characterized as follows: from the terminal side and
driven by events from user interaction with the scene, a generic
MPEG-4 Browser via Application Signaling requires from the Server MPEG-4
compliant streams of scene descriptions - constituting the very
application - and their companions e.g. Audio-Video streams.
Christ/Guillemot/Wesner November 16, 1998 [Page 2]
INTERNET-DRAFT draft-ietf-christ-rtsp-mpeg4-00.txt November 16, 1998
MPEG-4 Version 2 will introduce an advanced Interactivity Model
(MPEG-J). This should lead to application specific procedural code at
the terminal side, allowing e.g. for the local construction/
encoding(/decoding) of BIFS updates. As the script code probably would
be read as part of the scene description, the Browser could (probably)
remain generic, i.e. independent of any specific application.
The signaling syntax and semantics, as discussed below, is independent
of the signaling supporting mechanisms, i.e. of the procedural logic
mechanisms. These mechanisms are out of the scope of this document.
3. Relating Application Signaling to MPEG-4 Scene Description
In VRML/MPEG-4, syntax and semantics of the nodes of a scene determine
and confine the characteristics of interactivity possible. This is true
for the parameters available both in 'media-playing nodes' such as
MovieTexture, and in 'structure related' nodes such as Inline.
Even with future Proto and Script nodes in MPEG-4 Version 2, the
expressivity of signaling, e.g. with respect to media streams, will be
confined by such of the corresponding nodes in the scene.
Hence, in MPEG-4, as shortly indicated in the introduction of this
document, all interactivity and in turn all application signaling has
to be constructed in accordance with syntax and semantics of the
relevant nodes.
3.1. Media Content Playing Nodes
An object may be completely described within the scene description
(BIFS) information, or may also require elementary stream data from one
or more audio - visual objects, via the 'media content playing' nodes.
Therefore, interactivity and corresponding signaling with respect to
media objects has to be derived from the 'media content playing' nodes
such as VideoObject2D, MovieTexture, etc. An application signaling
method in that context would typically carry a PLAY, a PAUSE or a
TEARDOWN method.
3.2. Structure related nodes
Interactivity and application signaling concerning the structure of the
scene, e.g. changes of a scene, will be derived from 'structure related'
nodes such as Inline2D and Inline.
3.3. Usage of URL's
URL as parameter of type MFString, indicating the location of the media
Christ/Guillemot/Wesner November 16, 1998 [Page 3]
INTERNET-DRAFT draft-ietf-christ-rtsp-mpeg4-00.txt November 16, 1998
stream, or including a reference to an ObjectDescriptor (OD). An OD is a
level of indirection that can point either to another object descriptor
or to ES_descriptors that in turn provide the references to locations of
raw elementary streams associated with the node, via URL fields defined
as a string of 8 bits characters (type bit(8)). Interpreted by the
Browser these URL(s) will lead to the issuing of the signaling commands.
3.4. MPEG-4 Timing Model
A point in time at which an event occurs (change of a parameter value,
defining the start or stop of a media stream, etc.) is identified by the
SFTime fields of the media content playing nodes. The SFTime fields
indicate in general a time relative to the BIFS time base that applies
to the BIFS Elementary Stream that has conveyed the scene description.
The format of the SFTime field is 64-bit double-precision floating
point numbers (in ISO C floating point format ) indicating a duration in
seconds with respect to a reference point in time. This corresponds to
an NPT - Normal Play Time - in RTSP terms, except that, here, the
reference point in time (beginning of the presentation) is not expressed
in GMT time but provided by the StartCompositionTimeStamp of the scene
description stream.
SFTime fields of some nodes may require 'absolute' time values, given by
a "wall clock" time . The relation of the BIFS time base ticks, i.e. CTS
- composition time stamp - of the BIFS Access Unit that has conveyed
the respective scene description (BIFS) node, to the wall clock can be
resolved, if the wall clock time is known from the receiver. This is
achieved by an optional wallClockTimeStamp.
4. Application Signaling and the MPEG-4 media delivery framework DMIF
An MPEG4 application identifies a particular elementary stream through
its Elementary Stream Id (ESid), scoped by the service session it
belongs to.
When using DMIF (MPEG-4 Delivery Multimedia Framework), a 1-to-1
correspondence between each ESid and a channelHandle (chId) is realized
by the DMIF layer. The stream identified by the ESid is further
referred to through its channelHandle.
Dissociating connection management from stream control implies methods
syntactic extensions, namely possible stream identification by a
different syntactical mean other than the URL (see section 6.3). The URL
will be used only for the connection management.
Christ/Guillemot/Wesner November 16, 1998 [Page 4]
INTERNET-DRAFT draft-ietf-christ-rtsp-mpeg4-00.txt November 16, 1998
5. Extended RTSP methods syntax and semantics
5.1. NPT extension
The RTSP NPT format, consisting of a decimal fraction expressed in
either seconds or hours, minutes, and seconds, can then be used,
npt-time = "now" | npt-sec | npt-hhmmss
npt-sec = 1*DIGIT [ "." *DIGIT ]
npt-hhmmss = npt-hh ":" npt-mm ":" npt-ss [ "." *DIGIT ]
npt-hh = 1*DIGIT ; any positive number
npt-mm = 1*2DIGIT ; 0-59
npt-ss = 1*2DIGIT ; 0-59
However, it is necessary to provide the possibility for having the
reference point in time set to the value of the
startCompositionTimeStamp of the corresponding BIFS scene description
stream instead of 0.0 seconds. This would mean that the beginning of
the presentation is at time startCompositionTimeStamp of the
corresponding BIFS scene description stream. Hence, the NPT syntax can
be complemented by an optional field the wall clock time base. If this
field is not present, then the default value for the reference point in
time is 0.0 seconds.
npt-ref = npt-hh ":" npt-mm ":" npt-ss [ "." *DIGIT ]
npt-hh = 1*DIGIT ; any positive number
npt-mm = 1*2DIGIT ; 0-59
npt-ss = 1*2DIGIT ; 0-59
5.2. Random Access Point (RAP) and Range extensions
Method and RAP or range information could be stored as parameters in the
MFUrl class defined below in a preliminary syntax:
Christ/Guillemot/Wesner November 16, 1998 [Page 5]
INTERNET-DRAFT draft-ietf-christ-rtsp-mpeg4-00.txt November 16, 1998
class MFUrl
{
if (isMethod)
SFString GlobalMethod=method;
else
{
if (isOD)
{
bit(10) ODId;
MFString ESId=esid ESMethod=method ESR1=ES_RAP_info1 ESR2=ES_RAP_info2;
}
else
{
SFString urlValue ODId=odid;
MFString ESId=esid ESMethod=method ESR1=ES_RAP_info1 ESR2=ES_RAP_info2;
}
}
}
The GlobalMethod is introduced to allow for dealing with streams of all
BIFS/nodes belonging to the group or whole scene.
Note that the MPEG-4 system does not, so far, provide semantical means
for random access point and range information. The above class is a
proposal that the authors are submiting in parallel to MPEG-4.
It is proposed here to complement the relative time and range syntax in
the RTSP methods by possibly other range specifiers - also including the
case of degenerated ranges specifying just a single Random Acces Point.
Exept for the degenerated NodeID case, the 'other' ranges are still
under consideration. In any case, the syntax would be:
other_range = other_RAP_info#1 - other_RAP_info#2
other_rap_info = NodeId | ...
range-specifier = npt-range | other_range
5.3. Methods Extended Syntax
In addition to Random Access Point, Range and Time parameters, and in
order to allow for dissociation of connection management and stream
control, an additional syntactical mean - other than the URL mechanism -
for identifying a stream must be supported.
Christ/Guillemot/Wesner November 16, 1998 [Page 6]
INTERNET-DRAFT draft-ietf-christ-rtsp-mpeg4-00.txt November 16, 1998
This syntactical mechanism could be:
Loop(ch_identifier)
e.g. PLAY Loop(23,56,32)
5.4. Additional Optional Methods
The two additional optional methods proposed here find a strong interest
in environments with interactivity triggered by user navigation in the
presented scenes (e.g. Virtual Reality in the VRML / MPEG-4 spirit). In
a scene with several synchronized audio-visual streams, moving away from
one audio-visual stream could allow to suspend the delivery of the
stream, and coming back closer to it could resume the delivery of the
stream, at a point that would be synchronized with all the other streams
that have been maintained in the scene.
5.4.1. R-MUTE (Remote-Mute)
The R-MUTE method is inspired from the MUTE method specified by PREMO
[6]. However, in PREMO, the MUTE command suspends the presentation of
the streams on the terminal but does not suspend the delivery of the
streams.
The R-MUTE method would cause the stream delivery to be suspended
temporarily, but a 'local' progression - on the server side - on the
streams, with maintained synchronization actions, occurs without
delivery of the streams. The server will hence maintain the current
reading points of the on-going streams, and will then be able to resume
the delivery at the corresponding radom access point, when triggered by
the RESUME method.
5.4.2. RESUME
The RESUME method causes the re-start of the delivery of the stream,
that has been previously suspended by the R-MUTE method. The delivery
will be resumed at the random access point given by the server state
machine, which is also dependent of the stream time base and of the time
interval between the R-MUTE and the RESUME commands.
Remark: The above functionalities supposes the provision in the scene
description syntax and semantic of mechanisms for routing the whole
semantic of the user navigation triggered action to the media content
playing nodes.
Christ/Guillemot/Wesner November 16, 1998 [Page 7]
INTERNET-DRAFT draft-ietf-christ-rtsp-mpeg4-00.txt November 16, 1998
6. Authors Addresses
Paul Christ
Computer Center - RUS
University of Stuttgart
Allmandring 30
D70550 Stuttgart, Germany.
email: Paul.Christ@rus.uni-stuttgart.de
Christine Guillemot
INRIA
Campus Universitaire de Beaulieu
35042 RENNES Cedex, FRANCE
email: Christine.Guillemot@irisa.fr
Stefan Wesner
Computer Center - RUS
University of Stuttgart
Allmandring 3a
D70550 Stuttgart, Germany.
email: Stefan.Wesner@rus.uni-stuttgart.de
Christ/Guillemot/Wesner November 16, 1998 [Page 8]
INTERNET-DRAFT draft-ietf-christ-rtsp-mpeg4-00.txt November 16, 1998
7. References
[1] H. Schulzrinne, A. Rao, R. Lanphier, 'RTSP: Real Time
Streaming Protocol', RFC 2326, April 1998.
[2] Information Technology - Coding of Audiovisual Objects -
Part 1: Systems, ISO/IEC FCD 14496-1 [DRAFT], May-15,
1998.
[3] Information technology - Generic coding of moving pictures
and associated audio information - Part 6: Delivery
Multimedia Integration Framework', ISO/IEC 14496-6, May-
15, 1998.
[4] VRML 97: The Virtual Reality Modeling Language,
International Standard ISO/IEC 14772-1:1997
[5] Synchronized Multimedia Integration Language (SMIL) 1.0
Specification, W3C proposed recommendation, April-9, 1998.
[6] Information Processing Systems - Computer Graphics and
Image Processing - Presentation Environments for
Multimedia Objects (PREMO), Part 3: Multimedia Systems
Services, ISO/IEC 14478-3.
[7] ISO/IEC/JTC1/SC29/WG11: w/N2359 subpart 2 Verification
Model of Advanced BIFS (Systems VM subpart 2) . July 98
[8] ISO/IEC/JTC1/SC29/WG11: MPEG98/M4102, October 1998;
containing: draft-balabanian-rtsp-mpeg4-dmif-00.txt .
Sept. 22, 1998: Balabanian: The Role of DMIF with RTSP
and MPEG-4
Christ/Guillemot/Wesner November 16, 1998 [Page 9]