Internet DRAFT - draft-gentric-mmusic-stream-switching
draft-gentric-mmusic-stream-switching
Internet Engineering Task Force MMUSIC WG
Internet Draft
Philippe Gentric,
Philips Electronics
January 2004
expires July 2004
draft-gentric-mmusic-stream-switching-01.txt
RTSP Stream Switching
STATUS OF THIS MEMO
This document is an Internet-Draft and is in full conformance
with all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as "work
in progress".
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
To view the list Internet-Draft Shadow Directories, see
http://www.ietf.org/shadow.html.
Abstract
Stream switching is a technique used to change the data rate of a
media being streamed, typically for the purpose of adaptation to
the effectively available bandwidth of the network. A backward
compatible and independent RTSP "SWITCH" command is proposed in
order to enable RTSP-based stream switching.
Gentric [page 1]
Internet Draft RTSP Stream Switching January 2004
1. Introduction
Stream switching is a technique used to change the data rate of a
media being streamed, typically for the purpose of adaptation to
the effectively available bandwidth of the network.
The aim is that a real time streaming system can switch from
stream to stream in order to vary the data rate. This requires
that the same content is encoded as multiple streams at various
bit rates.
This memo specifies an independent and backward compatible RTSP
extension enabling RTSP servers and clients to support stream
switching.
Section 2,3 and 4 provide a detailed analysis of the problem,
section 5 is devoted to the proposed solution, section 6 list
open issues, section 7 is for security considerations.
1.1 Typical usage context
The typical scenario is video distributed on demand, also known
as "Video On Demand" (VOD). The situation is depicted in figure 1.
This is the domain of RTSP [RTSP] servers. HTTP is typically used
for the service/application i.e. provides the entry point,
usually a RTSP URL. The media can be pre-recorded on file or can
be a "live" source in which case the RTSP/RTP server acts as a
relay.
***************** *****************
* * HTTP * *
* HTTP Server * <------------------> * HTTP Client *
* * * *
***************** *****************
***************** *****************
* * RTSP * *
* RTSP Server * <------------------> * RTSP Client *
* * * *
***************** *****************
***************** *****************
* * RTP on UDP UC * *
Gentric [page 2]
Internet Draft RTSP Stream Switching January 2004
* RTP Sender * -------------------> * RTP Receiver *
* * * *
media * * RTCP SR * *
on --> * * -------------------> * *
file * * * *
or * * RTCP feedback * *
live * * <------------------- * *
***************** *****************
Figure 1: video on demand
1.2 Rate control issues
The typical usage of stream switching is for adaptation to the
effectively available end-to-end bandwidth i.e. rate (or
congestion) control.
Specifically the streaming system (i.e. sender and receiver),
upon detection of variations in the effective bandwidth changes
the end-to-end data rate.
This document does not address rate control algorithms i.e. the
way to compute a target bit rate based on some measurement of the
network. Actually rate control algorithms are an orthogonal
aspect of the problem addressed here which is the signalization
required inside a streaming system in order to perform switching.
It is assumed that specifications such as TFRC [TFRC] or work in
that area (see [Widmer], [Vojnovic], [Bansal]) should be used to
deal with this issue; there is a need to adapt the algorithms due
to the limited granularity of available rates when using stream
switching; we have experimental evidence that these issues are
manageable.
1.3 Content negotiation issues
Stream switching requires specific content negotiation taking
into account the possibility to change the configuration during
the session.
Stream switching is actually bridging the gap between
"traditional" rate control (i.e. in TCP quasi-continuous changes
in the data rate) and "traditional" content negotiation where
sessions are negotiated for a constant data rate.
1.4 Seamless switching
Gentric [page 3]
Internet Draft RTSP Stream Switching January 2004
Seamless stream switching is obtained when the switch is
performed in such a fashion that media playback is minimally
disturbed.
A counter-example is when traditional content negotiation is
used: then after a given data rate is negotiated the session is
started and when it is obvious that the chosen data rate is not
acceptable (usually because it is too high) a new session is re-
negotiated. The switch is not seamless because it takes time to
tear down the session and re-initialize another one.
Therefore seamless stream switching consists in preparing a set
of sessions (or a set of configurations within a session) and
providing a fast signaling mechanism so that the switch is
effectively instantaneous.
A side effect of seamless switching (which is a "user"
requirement) is to minimally disturb the network i.e. it is a
well known congestion control issue that when congestion occurs
the fastest the response the minimal the perturbation will be.
1.5 Motivation for standardization
Media delivery technologies are based on the availability of
extremely optimized constant bit rate encoders. On the other hand
IP networks that are being deployed for consumer access do not
have stable end-to-end bandwidth. These two facts cause major
problems for operators wishing to deploy "best-effort" streaming
services in a robust fashion, the current status-quo being that:
. On one hand it is assumed that once a program is requested by a
client it will cause the server to send data toward the client at
the "nominal" constant rate, regardless of the state of the
network on the path from the server to the client.
. On the other hand there is a common assumption (see for example
[3GPP-BWS]) that stream switching is the way to improve this
situation; for that reason it is implemented and deployed in non
inter-operable ways by many vendors.
It is desirable to change this status-quo for several reasons.
Firstly the "constant rate method" causes problems whenever a hop
in the path from sender to receiver experiences congestion; These
problems are routing buffer overflows (and more specifically for
wireless networks: base station buffer overflows) and perceptible
Gentric [page 4]
Internet Draft RTSP Stream Switching January 2004
artifacts during playback, creating 3 populations of unhappy
people: network managers, end users watching the video and end
users for other traffic traveling on the same path!
Secondly the control algorithms for proprietary stream switching
type of traffic are not publicly specified regarding congestion
control and therefore the deployment on large scales (i.e.
comparable to TV broadcast scales) of streaming services is not
proven to be safe in terms of network pathologies.
Thirdly the various parties involved in streaming media
commercial deployments i.e. content providers, network operators
and various technology providers -being aware of these issues-
are stalling, thereby compromising the immediate future of media
distribution and as a direct consequence of various new consumer
high data rate network deployments in both the wired and wireless
domains.
Note that wireless network technologies are especially sensitive
to these issues because of the inherent variability of the radio
bandwidth, which has triggered attention and efforts in 3GPP (see
for example [3GPP-alt-attr] and [3GPP-BWS]) on these issues.
In conclusion there is a pressing demand for inter-operable
solutions and also a demand for solutions that -being standard-
have a better defined and/or more transparent behavior.
In short the goals of a standard framework for stream switching
would be:
. To enable the emergence of more advanced inter-operable
streaming products.
. To enable the advent of technical and/or commercial
specifications for streaming products and services that have a
well characterized behavior regarding bandwidth management.
1.6 Requirements
The key requirement is that the user experience should be the
best possible which means that switching must be seamless. This
requirement implies very specific timing constraints on the way
the switch is performed, typically the sender should stop sending
one stream and start sending the other stream at exactly the same
media time and same real time, otherwise problems will occur in
terms of buffer at the receiver side, since on the client side
Gentric [page 5]
Internet Draft RTSP Stream Switching January 2004
the buffers must maintain a stable amount of media time (a media
decoder is paced in terms of media time i.e. 1 second of media is
decoded in 1 second of elapsed time). Specifically in stream
switching the challenge is to avoid buffer underflows where the
decoder pauses playback and displays the infamous "re-buffering"
message.
The simple consequence is that the streaming data source must be
informed as soon as possible that it needs to change its output
rate, otherwise it will keep on sending at the same excessive
rate, which will result in:
. Filling up more buffers in the network devices upstream from
the limiting hop, thereby amplifying the congestion.
. If one or more network devices arrive to the point of
saturation this will cause losses not only in the media stream in
question but also in other traffic (constant data rate UDP
traffic is known to be "aggressive" in this context since TCP
traffic will automatically fall back).
. It will delay the instant when the new stream would reach the
decoder, thereby increasing the chances that the decoder actually
runs its buffer down to underflow.
The ability to be able to increase the rate -when the available
bandwidth increases and with due care to congestion control- is
also a requirement; it has much less stringent technical
implications. Actually having a really seamless switch is then
possible in all cases.
1.7 Vocabulary
We define a "program" as a set of "tracks", for example a movie
is composed of an audio and a video track. We define a "stream"
as an encoded instance of a track, for example the video track of
a movie may be encoded at 50kb/s, 150 kb/s and 400 kb/s using
respectively H263 baseline SQCIF 7.5 fps, MPEG-4 SP@L3 QCIF 15
fps and MPEG-4 ASP@L3 CIF 30 fps, the audio track may be encoded
at 5 kb/s, 20 kb/s, 48 kb/s and 80 kb/s using respectively AMR,
AMR WB, AAC mono and AAC stereo.
We define one "flavor" of a program as a given set of streams (a
pair for a movie, usually consisting in audio and video), for
example 400 kb/s video and 80 kb/s AAC is the high quality flavor
in the example above for which we have 12 different flavors (but
some flavors may not always make sense).
Gentric [page 6]
Internet Draft RTSP Stream Switching January 2004
We define a "switch-set" as the set of all the streams for a
track or a program. A switch-set can be organized either as
ordered first by track or first by flavor. Obviously switch-sets
are prepared during the content production or deployment phase.
2. Seamless stream switching technical issues
2.1 Configuration issues
When streams are switched there are 2 fundamental cases: either
the streaming configuration changes or it does not change.
2.1.1 No configuration change
The case when the streaming configuration does not change is the
most simple.
This case can also be described as "nothing changes but the
bit rate". Many codecs actually support this "natively"; for
example video codecs and recent speech codecs (AMR, EVRC) as well
as recent music codecs (AAC in some modes) have the property that
they can decode instantly-variable-bit-rate streams.
One important thing to note is that in RTP terms the Payload Type
remains the same. Specifically the RTP session remains the same.
For these reasons this mode is also called "client-transparent"
since (in theory) the source can switch without forewarning the
client.
Care must be taken however that some player implementations may
actually be sensitive to sudden bit rate changes, or may prefer
to be warned/notified about them.
2.1.2 Configuration change
The second case, when the streaming configuration changes or even
when the codec itself changes is more complex because:
. In the general case one has to assume that the client has to
instantiate (or invoke) 2 (or more) completely different
(hardware and/or software) codecs, rendering systems and network
reception stacks (or at least different payload processors).
Obviously this may involve substantial processing and/or
buffering resources. These are implementation details out of the
scope of this memo, however an important rule derives from this:
Gentric [page 7]
Internet Draft RTSP Stream Switching January 2004
servers MUST NOT switch streams involving a codec configuration
change but upon reception of an explicit request from the
receiver or with an explicit prior agreement. Also authentication
should be used for these requests (see the security consideration
section).
. In the general case feeding a codec with a stream for a
different codec, or a different configuration can crash a
decoder. Therefore there must be a error-proof way to signal the
change at the packet or encoded frame granularity. Fortunately
RTP does have such a capability with the Payload Type field.
. In the general case changing codec (from example from AMR to
AAC) also involves changing the RTP payload format. Fortunately
this is also covered by the RTP Payload Type field.
The important thing to note is that in SDP and RTP terms the
Payload Type has to be different for these type of
configurations.
This is called "client non-transparent" stream switching.
2.1.3 Mixed transparency configurations
It is typical that a given switch-set mixes both client non-
transparent" and "client transparent" modes.
One could wish that the client-transparent mode would be
"enough"... however "client-transparent" switches usually do not
cover as wide a bandwidth range as "client non-transparent" ones
due to the bit rate range of each specific codec.
For example a service deployed for CD-quality music using stereo
AAC cannot go below 32 kb/s in the client-transparent mode
because AAC does not go below this bit rate. On the other hand a
client non-transparent switch involving a speech codec (say AMR)
enables to define "fall back" streams with as little as 4 kb/s.
2.2 Codec access points
For all streams it is possible to "switch out" at any point,
however some streams (video is typical) cannot be "switched in"
at any points, typically these codecs have several types of frame
regarding random access. Some are full random access points
(typically I frames, or S frames for recent codecs such as H264)
others have other types of partial random access points frames
such as frames mixing I macro-blocks and P macro-blocks etc.
Gentric [page 8]
Internet Draft RTSP Stream Switching January 2004
From the point of view of the decoder these can be seen as
implementation details i.e. if a server switches at a non random
access point the client should be able to detect it and act in
relevance with its capability to handle it. Indeed from the
decoder point of view a stream switch on a non-random-access-
point is similar to receiving packets after a loss.
It could be useful however that a client could indicate to the
server that it prefers a switch at a random access point.
2.3 The control issue
The first key question is to understand if the decision to switch
is taken by the receiver or by the sender.
2.3.1 Server initiated switch
Server initiated switch has 4 major advantages
. It can be made to work in a similar fashion for all scenarios.
. It resembles more the TCP situation improving the chances that
some of the considerable know-how acquired with TCP in terms of
congestion/rate control can be reused.
. It makes more sense to have other source of information about
the status of the network than the receiver(s). For example
routers on the path may be able to issue congestion notifications
much earlier than if one must wait for the perturbation to reach
the final destination and feedback signals to travel back (see
also [TRIGTRAN]).
. It allows one very simple "catastrophe prevention" mechanism:
Since the sender does not need to warn the receiver before
switching the sender can decide to switch down when feedback from
the receiver has not been received for a given amount of time
(TRFC uses in the order of 4 RTTs).
As has been discussed above the server can decide to switch
without telling the receiver only in 2 cases:
. if the decoder configuration does not change. In the context of
SDP/RTP this means that the Payload Type must not change. For
this type of configuration the existing set of IETF
specifications is usable in terms of session description and
management, specifically "normal" RTCP can be used to send
Gentric [page 9]
Internet Draft RTSP Stream Switching January 2004
feedback and it can be seen as a server implementation issue that
the server decides to switch based on client RTCP feedback. There
could be a need to document this, maybe not as a standard
specification but surely as "practice" in inter-operability
forums.
. if there was a prior agreement. In the context of SDP/RTP this
means that the client has "instantiated" several "stacks" (one
for each flavor of each stream) and is ready to receive data on
each of these channel (by channel one means that either or both
the destination UDP port and the Payload Type differ). This means
that possibly substantial resources must be pre-allocated on the
receiver side. This is wasteful in case the network behaves so
that the session runs entirely with the initial streams or uses
only a fraction of these resources. Obviously a little signaling
could help here... Note also that although these streams have
different Payload Types this signaling may not be early enough...
2.3.2 Player initiated switch
The player can also initiate the switch and using RTSP is the
obvious choice.
We will see however that the existing RTSP specification needs to
be extended in order to provide seamless stream switching.
3. Description of the switch-set
Clearly there is a need to convey a description of the switch-set
to the client. There are several ways to perform that, that we
will describe now.
3.1 SDP Description of the switch-set
One way to describe the alternative flavors for each stream
composing a program is to list them using SDP, an example of one
such description is given in Appendix.
For client initiated switches there is a need to convey the
bandwidth of a stream, but this is already available.
Otherwise the exact SDP syntax to use in order to describe that
streams are alternatives of a given track (media) is debatable;
SDP has several extensions that can be considered [grouping],
also new extensions are a possibility, 3GGP has specified one
such SDP syntax for its Release 6 (see [3GPP-alt-attr]).
Gentric [page 10]
Internet Draft RTSP Stream Switching January 2004
3.2 SMIL Description of the switch-set
SMIL is a scene description language [SMIL]. In SMIL the "switch"
element allows an author to specify a set of alternative elements
from which only the first acceptable element is chosen. Actually
the SMIL specification specifies that the bit rate is one typical
thing that would change among streams in a switch element.
In short the SMIL element "switch" provides a standard way to
declare to the client all the possible "flavors" of each stream.
However SMIL 2.0 supports only parse-time evaluation i.e. it
basically assumes that the evaluation of which stream to use is
done once. Furthermore even when dynamic re-evaluation will be
specified in future versions, SMIL will typically not specify how
the switching should be performed.
In conclusion the SMIL switch syntax element is a building block
that could very nicely complement an IETF specification of how to
perform stream switching at the transport (and transport control)
level.
3.3 MPEG-4 system Description of the switch-set
MPEG-4 [MPEG-4] provides a way to describe alternative streams.
However since this type of manipulation would be performed from
the context of a terminal implementing the MPEG-4 system
specification it is a priori out of the scope of this memo.
4. Switching control
4.1 Switching by changing the RTSP session
One way to perform stream switching is to use RTSP TEARDOWN in
order to destroy the session and then restart another one.
Unfortunately this method involves several round trips which will
typically cause playback to stop, in short it is practically
impossible to make it seamless. For that reason this method -
although "it works"- will not be discussed further.
4.2 Switching within the same RTSP session
One way to perform switching at the session level is to enable
the definition of a "switchable session" i.e. an extended session
that is negotiated as containing all alternative streams from the
very start.
Gentric [page 11]
Internet Draft RTSP Stream Switching January 2004
Using RTSP has the following advantages:
. The method is completely independent of the codec capabilities.
. It directly provides both content and capability negotiation as
well as control.
. It inherits all RTSP (and therefore HTTP) security features.
4.3 Switching using RTSP PLAY/PAUSE
The usage of PLAY/PAUSE command for stream switching would be as
follows:
At the time of session negotiation the client and server prepare
to stream all the variants in the switch-set but PAUSE all
streams except one per media type. Switching is performed by
issuing simultaneously a PAUSE command on the stream being
switched out and an PLAY command on the stream being switched in.
Unfortunately doing that involves a trick where the client must
specify the pause point (see the RTSP PAUSE specification for
detail [RTSP]). But then finding out the appropriate time to use
as "pause point" is not a trivial issue at all. For this reason
this method cannot be used either.
4.4 Switching using RTSP MUTE/UNMUTE
An extension to RTSP called MUTE/UNMUTE has been proposed [RTSP-
MUTE] . It defines MUTE and UNMUTE as 2 additional optional RTSP
commands. MUTE enables a client to request the server to stop
sending data for a given stream and in this respect is similar to
PAUSE. However UNMUTE requests the server to resume sending data,
not at the point in media where MUTE was issued, but at a point
of time synchronous with the media streams that were being still
streamed.
The usage of this command for stream switching would be as
follows: at the time of session negotiation the client and server
prepare to stream all the variants in the switch-set but MUTE all
streams except one per media type. Switching is performed by
issuing simultaneously a MUTE command on the stream being
switched out and an UNMUTE command on the stream being switched
in.
The drawback is that for each "atomic" switch two commands have
to be issued.
Gentric [page 12]
Internet Draft RTSP Stream Switching January 2004
Also this does not cover the need for additional signalization as
detailed above.
4.5 Switching using RTSP SET_PARAMETER
SET_PARAMETER and even OPTIONS has been evoked as candidates for
client-initiated stream switching (see [3GPP-BWS]).
A possible syntax would be:
C->S: SET_PARAMETER rtsp://foo/twister/audio1 RTSP/1.0
CSeq: 421
Content-length: xx
Content-type: application/stream-switching
Replace-with: rtsp://foo/twister/audio2
S->C: RTSP/1.0 200 OK
CSeq: 421
The motivation is that SET_PARAMETER has been designed to provide
some type of extensibility to RTSP, the drawback however is that
it is not an explicit command.
Also this does not cover the need for additional signalization as
detailed above.
5. Proposed specification
The proposal is to introduce new RTSP Methods specifically for
stream switching.
As indicated in [RTSP section 1.5] the advantage of a new Method
by comparison with extending an existing method is that a
component that does not know the new method will reply with "501
not implemented" which makes backward compatibility issues easy
to solve. Furthermore there is a need for additional Header-
fields as described below that are best introduced for new
Methods.
Also it is desirable that this specification should be as
independent as possible of the RTSP specification and of its
evolutions (with a required side effect of having backward
compatibility with [RTSP]).
For that reason this memo defines stream switching primitives
that are orthogonal to the rest of RTSP in terms of state machine
Gentric [page 13]
Internet Draft RTSP Stream Switching January 2004
and signaling. This specification does not modify the syntax or
semantic of any RTSP Method or Headers and the stream switching
state machine is defined as being "inside" each state of the RTSP
state machines in both the client and server. For example streams
can be switched during a PAUSE as well as during a PLAY, etc.
(Note that there is an exception to that principle for
SWITCHCLOSE issued on a playing stream, see below)
All the stream switching methods are OPTIONAL but it is
RECOMMENDED to implement all of them. For example the attention
of the implementer is attracted on the usefulness of SWITCHCLOSE.
5.1 SWITCHSETUP
5.1.1 SWITCHSETUP rationale
Introducing SWITCHSETUP is better than re-using SETUP in the
respect that it is explicitly for stream switching purposes.
It is also highly desirable that a stream-switching enabled
player can connect to a "old" RTSP server (that does not
implement stream switching). Therefore it is desirable that the
behavior of existing servers is fully defined. For that reason
SWITCHSETUP is useful in the respect that an "old" server will
refuse it, clearly indicating to the client that it does not
support stream switching. In this case SDP files describing
switch-sets can also be used with "old" servers.
5.1.2 SWITCHSETUP specification
The SWITCHSETUP Method is similar to SETUP except that it
explicitly tells the server that the corresponding stream is
part of a switch-set.
For maximum backward compatibility a client MUST use SETUP for
the primary streams and SWITCHSETUP for the alternative streams.
This way a server that does not support stream switching will
reply "501" to SWITCHSETUP but will SETUP the primary streams (a
possible alternative -if SETUP was used for all streams- being a
server allocating a lot of resources for a function that it
cannot perform!).
SWITCHSETUP may be issued at anytime during a RTSP session.
SWITCHSETUP issued on a playing stream is similar to SETUP.
Gentric [page 14]
Internet Draft RTSP Stream Switching January 2004
5.1.3 SWITCHSETUP "Switch-control" header field
The SWITCHSETUP Method has an OPTIONAL header field: "Switch-
control"
The Switch-control request-header field can be used to specify to
the server how the client supports stream switching control.
The values below are mutually exclusive.
"Switch-control=client-initiated-only": Tells the server that it
MUST NOT switch on its own but only upon reception of a client-
to-server SWITCH command. This is relevant for any type of
switch, including client-transparent switches.
"Switch-control=non-transparent-client-initiated-only": Tells the
server that it MUST NOT switch on its own but only upon reception
of a client-to-server SWITCH command for non-client-transparent
switches. Specifically the server CAN switch on its own for
client-transparent switches. This is the default i.e. a server
MUST assume this value for absent or malformed Switch-control
header fields.
"Switch-control=server-initiated-ok": Tells the server that it
CAN switch on its own without warning the client first for all
types of switches (i.e. the client has allocated all the
necessary resources).
"Switch-control=forewarning: 2000": Tells the server that it CAN
switch on its own but that then it MUST warn the client by using
a SWITCHSIGNAL (see below) and that this forewarning MUST be sent
at least 2000 milliseconds before the server performs the switch.
This is relevant for any type of switch, including client-
transparent switches.
"Switch-control=non-transparent-forewarning: 2000": Tells the
server that it CAN switch on its own but that for non-transparent
switches it MUST warn the client by using a SWITCHSIGNAL (see
below) and that this forewarning MUST be sent at least 2000
milliseconds before the server performs the switch.
5.1.4 SWITCHSETUP "RAP" header field
The SWITCHSETUP Method has an OPTIONAL header field: "RAP"
The RAP request-header field can be used to specify to the server
Gentric [page 15]
Internet Draft RTSP Stream Switching January 2004
how the client supports stream switching regarding Random Access
Points.
The values below are mutually exclusive.
"RAP=RAP-only": Tells the server that it MUST switch only on
Random Access Point (in the "new" stream). For SWITCH requests
corresponding to drastic (more than 50%) rate reduction i.e. in
case rapid action against congestion is preferable to smoother
playback, servers MUST then interrupt the on-going stream
immediately and restart streaming at the next available RAP in
the new stream (which effectively creates a gap in the stream).
"RAP=indifferent": Tells the server that it CAN switch at any
point (in the new stream). This is the default i.e. a server
SHOULD assume this value for absent or malformed RAP header
fields.
"RAP=if-before:300": Tells the server that it SHOULD wait to
switch on a Random Access Point (in the new stream) unless such a
point is not available in less than 300 milliseconds of Normal
Play Time, in which case the server MAY switch at any point.
Servers MUST ignore this recommendation for SWITCH requests
corresponding to drastic (more than 50%) rate reduction i.e. in
case rapid action against congestion is preferable to smoother
playback.
5.2 SWITCH
The "SWITCH" Method is an OPTIONAL atomic command from the client
to the server requesting the server to switch from one stream to
another.
The stream to switch off is indicated as a parameter of the
Method. The stream to switch on is indicated with the Header
Field "Replace-with" as shown in the example below:
C->S: SWITCH rtsp://foo/twister/audio1 RTSP/1.0
CSeq: 421
Replace-with: rtsp://foo/twister/audio2
S->C: RTSP/1.0 200 OK
CSeq: 421
Range: smpte=0:10:22-;time=19970123T153600Z
RTP-Info: url=rtsp://foo/twister/audio2;
seq=12312232;rtptime=78712811
Gentric [page 16]
Internet Draft RTSP Stream Switching January 2004
See the Appendix for a fully detailed example.
The "Replace-with" Header Field may be absent or empty signaling
that the target stream should be stopped with no replacement, a
symmetric SWITCH with an empty target can be used to restore the
corresponding track (this is useful in order to temporarily
suppress the video in order to reach a very low bit rate for
example with a news service on a mobile device, in that case
SWITCH is equivalent to the MUTE command of [RTSP-MUTE].
SWITCH requests MAY be issued at any time during a RTSP session
(including before the acknowledgement of a previous request is
received). When receiving several SWITCH requests a server SHOULD
ignore/abandon the oldest ones. In all cases a server MUST
execute as fast as possible requests producing a smaller data
rate (the smallest if several requests are pending). A server MAY
delay or deny the execution of requests corresponding to higher
data rates, for example if it has reached its maximum capacity. A
server SHOULD NOT deny SWITCH request for smaller rates.
The server response to a SWITCH from a player SHOULD contain the
same information as the answer to PLAY. Note for example that the
use of RTP-info as in the above example allows instantaneous lip-
sync (the alternative being that the player must wait for the
RTCP Sender Report) and also may help the receiver to identify
the exact packet corresponding to the new stream (especially in
client-transparent cases), which in turn is useful for resetting
traffic monitoring computations, etc.
5.3 SWITCHSIGNAL
As its name hints, SWITCHSIGNAL is a "signal" rather than a
command. SWITCHSIGNAL is an OPTIONAL server signal to the client
that a switch will soon be (or is being) performed. The stream to
be switched off is indicated as a parameter of the Method. The
stream to be switched on is indicated in RTP-Info as shown in the
example below:
S->C: SWITCHSIGNAL rtsp://foo/twister/audio1 RTSP/1.0
CSeq: 4213
Range: smpte=0:10:22-;time=19970123T153600Z
RTP-Info: url=rtsp://foo/twister/audio2;
seq=12312232;rtptime=78712811
C->S: RTSP/1.0 200 OK
CSeq: 4213
Gentric [page 17]
Internet Draft RTSP Stream Switching January 2004
It is RECOMMENDED that the server SHOULD issue SWITCHSIGNAL as
soon as possible before the actual switch and adds all possible
information in it (range, RTP-info etc) as in response to PLAY.
For the client-transparent case SWITCHSIGNAL is normally not
necessary for the correct behavior of the streaming system but
client may register the need to receive such notification (see
SWITCHSETUP above).
For the non-client-transparent case the server MUST respect the
instructions provided by the client in the SWITCHSETUP commands
about the need to issue SWITCHSIGNAL since -unless "Switch-
control=server-initiated-ok" was explicitly signaled- a server-
initiated switch without forewarning would typically cause the
client to produce degraded playback or can even crash it.
5.4 SWITCHCLOSE
5.4.1 SWITCHCLOSE rationale
It is highly desirable that a stream-switching enabled player can
free non-used resources in order to allocate other resources.
A typical example is a session nominally at 10 Mb/s for which a
large number of alternative streams are available (say 50
different bit rates all the way from high quality HDTV with 5+1
music down to stamp-sized video with mono speech "backup"
configuration).
In such a case a typical usage would be that the client would
SWITCHSETUP only a few alternatives (say 8 Mb/s, 5 Mb/s, 1 Mb/s)
which could involve a substantial amount of memory in case these
configurations are supported using different codecs, etc.
If the network condition degrades catastrophically this player
may need to allocate other resources in order to switch to lower
bit rates. In this case it would be highly valuable that it can
free (some of) the resources corresponding to the highest bit
rates.
It is also highly desirable that a server can free resources
implicitly allocated after accepting a SWITCHSETUP (including
for DOS resistance); but then it is very useful to tell the
player that hypothetical corresponding SWITCH requests would be
denied.
Gentric [page 18]
Internet Draft RTSP Stream Switching January 2004
5.4.2 SWITCHCLOSE specification
SWITCHCLOSE tears down the resources corresponding to a given
SWITCHSETUP identified by the (same) target URL (as used in
SWITCHSETUP).
SWITCHCLOSE is OPTIONAL.
SWITCHCLOSE can be issued by a server or by a client.
SWITCHCLOSE MAY be issued at anytime during a RTSP session.
SWITCHCLOSE issued on a playing stream causes the corresponding
track to be stopped i.e. only a PLAY can restore this track and a
SWITCHSETUP is required to restore the stream as a possible
future alternative. A player SHOULD NOT issue SWITCHCLOSE on a
playing stream, PAUSE or SWITCH SHOULD be first issued for that
stream. However SWITCHCLOSE MAY be used by a server on a playing
stream in order to signal that this stream is been terminated and
will not be resumed unless the client takes explicit action.
Example:
C->S: SWITCHCLOSE rtsp://foo/twister/audio1 RTSP/1.0
CSeq: 42134
S->C: RTSP/1.0 200 OK
CSeq: 42134
5.5 SDP rules
A SDP describing a switch-set MUST use different (dynamic)
Payload Type for streams that are not client-transparent
switchable.
A SDP describing a switch-set MAY use identical (dynamic) Payload
Type for streams that are client-transparent switchable.
A SDP describing a switch-set MAY use identical port numbers for
streams that are client-transparent switchable.
6. Open issues
6.1 SDP issues
Is there a need for additional SDP syntax and/or rules to
describe the switch-set? (or is the example in Appendix OK?)
Gentric [page 19]
Internet Draft RTSP Stream Switching January 2004
Should it be actually RECOMMENDED (or even a MUST?) to reuse the
same (dynamic) payload type for alternate streams of the "client-
transparent" type?
6.2 Other issues
Status codes: additional status codes may be necessary(?). For
example when switching has not been performed because a more
recent request arrived...or because max capacity is reached?
Stream Switching should work for RTP interleaved inside RTSP?
Is there an alternative to doing one SETUP per alternate stream?
Would it be worth the trouble to define a specific syntax?
In the client-transparent mode assuming neither the payload type
nor the port number change it should not be necessary to make
one SETUP per stream (right?), shall it be documented/mandated?
Are there specific firewall/proxy considerations?
6.3 UDP transport of switching command
It is a good idea to also provide a UDP based command. The key
motivation of doing that is that UDP feedback may be faster and
as mentioned earlier speed is a key factor for optimal congestion
control as well as switch seamless-ness.
Should this be done using an RTCP extension? Or use "rtspu"? (but
isn't rtspu going to be dropped?)
For security (UDP being easier to spoof than TCP?) this could be
restricted to "down" switch since for congestion control
purposes there is never any hurry to switch up? could be
restricted to the client-transparent case?
6.4 Independence with RTSP parallel evolution
There is a possible exception to that for SWITCHCLOSE issued on a
playing stream. But it looks like a very logical one?
7. Security considerations
The security issues associated with stream switching are those
inherent to the usage of RTP and RTSP plus:
Gentric [page 20]
Internet Draft RTSP Stream Switching January 2004
7.1 Induced server misbehavior
The following threats can be identified:
. Causing the server to allocate a lot of resources (in making
ready for supporting switching for a large switch-set). Note
however that a server can deny SWITCHSETUP requests using for
example "503 Service Unavailable" (temporary) or "416 Requested
Range Not Satisfiable" (permanent) and can issue SWITCHCLOSE at
anytime. Also the server is often the source of the SDP (via
DESCRIBE) and therefore has opportunities there to reduce the
diversity.
. Causing the server to switch up toward high bit rate streams
can create large amounts of network traffic. Note however that
the typical usage of stream switching is anyway to deploy the
service with the maximum bit rate as a primary target...With
stream switching streaming servers would actually become
bandwidth control tools for operators.
. Causing the server to switch down toward low bit rates causes a
degraded service.
. Causing the server to frequently switch is a source of degraded
service but is also a Denial Of Service Attack in the sense that
it would typically cause the server to consume substantial
resources in switching, thereby reducing the service capacity
for example by reducing the maximum number of concurrent streams
that the server can serve or the maximum total throughput of the
server, etc. The defense of a server is probably to refuse too
frequent switches and especially upward switches...
These threats are fended off by applying authentication to the
stream switching control messages. RFC2326 section 16 provides
guidance on how to perform that with RTSP.
Also server implementations SHOULD include configurable
limitations such as a maximum number of switches per amount of
time per media track, a maximum number of alternate streams per
client, etc.
7.2 Induced client misbehavior
One threat is that a server could cause the receivers to
misbehave (or crash) for example if the data sent is encoded with
a different decoder configuration than the one the player was
initialized with.
Gentric [page 21]
Internet Draft RTSP Stream Switching January 2004
For that reason this specification makes special care that
server-initiated switches are possible only for agreed upon
streams (using SWITCHSETUP) and either for client-transparent
switches (and a client can disable these anyway) or in conditions
specified by the client with "safe" defaults.
8. Acknowledgements
The author wishes to thank Alain Teil, Kamal Rada, Yves Ramanzin
and Nicolas Delahaye for all the fruitful discussions and
comments.
9. References
[Widmer] A survey on TCP-Friendly Congestion Control, J.
Widmer, R. Denda, M. Mauve, IEEE Network May-June 2001,
http://www.informatik.uni-
mannheim.de/informatik/pi4/publications/library/Widmer2001a.pdf
[Vojnovic] One the long-run behavior of equation-based rate
control, M. Vojnovic, J.Y. Le Boudec, Proceedings of SIGCOMM'02,
August 19-23 2002, Pittsburg, Pensylvania, USA,
http://www.acm.org/sigcomm/sigcomm2002/papers/equation.pdf
[Bansal] Dynamic Behavior of Slowly-Responsive Congestion
Control Algorithms, D. Bansal, H. Balakrishnan, S. Floyd, S.
Shenker, Proceedings of SIGCOMM'01, August 27-31 2001, San Diego,
California, USA, http://www.acm.org/sigcomm/sigcomm2001/p21-
bansal.pdf
[RTP] http://www.ietf.org/rfc/RFC1889.txt
[RTSP] http://www.ietf.org/rfc/RFC2326.txt
[HTTP] http://www.ietf.org/rfc/RFC2616.txt
[grouping] http://www.ietf.org/rfc/RFC3388.txt
[TFRC] http://www.ietf.org/rfc/RFC3448.txt
[SMIL] http://www.w3.org/TR/smil20/cover.html
[MPEG-4] http://mpeg.telecomitalialab.com/standards/mpeg-
4/mpeg-4.htm
[3GPP-alt-attr]
Gentric [page 22]
Internet Draft RTSP Stream Switching January 2004
http://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_22/Docs/S4-
020407.zip
[3GPP-BWS]
http://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_25/Docs/S4-
030024.zip
[RTSP-MUTE] http://www.ietf.org/internet-drafts/draft-
sergent-rtsp-mute-00.txt
[TRIGTRAN] http://www.ietf.org/internet-drafts/draft-
dawkins-trigtran-probstmt-00.txt
9. Authors' Addresse
Philippe Gentric
Philips Software
51 rue Carnot
92156 Suresnes
France
e-mail: philippe.gentric@philips.com
Appendix: Detailed example
C->S: DESCRIBE rtsp://foo/twister RTSP/1.0
CSeq: 1
Server replies with the full content description: there are 3
video streams, at 200 kb/s, 100 kb/s and 50 kb/s there are 3
audio streams, at 20 kb/s, 10 kb/s and 5 kb/s
NB: the example is invalid in the respect that normally it would
require more detail such as decoder configurations which are
omitted for the sake of simplicity ...
S->C: RTSP/1.0 200 OK
CSeq: 1
Content-Type: application/sdp
Content-Length: xxx
v=0
o=- 2890844256 2890842807 IN IP4 172.16.2.93
s=RTSP Session
i=An Example of RTSP Session Usage for Stream Switching
Gentric [page 23]
Internet Draft RTSP Stream Switching January 2004
a=control:rtsp://foo/twister
t=0 0
m=video 7722 RTP/AVP 96
a=rtpmap:96 MP4V-ES/1000
a=control:rtsp://foo/twister/video1
b=AS:200
m=audio 7724 RTP/AVP 97
a=rtpmap:97 mpeg4-generic/44100/2
a=control:rtsp://foo/twister/audio1
b=AS:20
m=video 7726 RTP/AVP 98
a=rtpmap:98 MP4V-ES/1000
a=control:rtsp://foo/twister/video2
b=AS:100
m=audio 7724 RTP/AVP 99
a=rtpmap:99 mpeg4-generic/44100/2
a=control:rtsp://foo/twister/audio2
b=AS:10
m=video 7726 RTP/AVP 100
a=rtpmap:100 MP4V-ES/1000
a=control:rtsp://foo/twister/video3
b=AS:50
m=audio 7724 RTP/AVP 101
a=rtpmap:101 mpeg4-generic/44100/2
a=fmtp:101 streamtype=5; profile-level-id=15; mode=AAC-hbr
a=control:rtsp://foo/twister/audio3
b=AS:5
The second set is SETUP where client and server agree on the
transport parameters (UDP port numbers etc). Note that the client
waits for the reply to the first SETUP in order to have the
session number and then sends all the SWITCHSETUPs in rapid
succession so that this operation takes approximately 2 round
trips independently of the number of streams.
In this example different UDP ports are used but the same port
could also be reused since by rule the switch is either performed
on streams that are of the client-transparent type or that have a
different payload type.
Note the "Switch-control=client-initiated-only" header field
Gentric [page 24]
Internet Draft RTSP Stream Switching January 2004
which signals to the server that it MUST NOT switch on its own
but only upon reception of a SWITCH command.
C->S: SETUP rtsp://foo/twister/audio1 RTSP/1.0
CSeq: 2
Transport: RTP/AVP;unicast;client_port=8000-8001
S->C: RTSP/1.0 200 OK
CSeq: 2
Transport: RTP/AVP;unicast;client_port=8000-8001;
server_port=9000-9001
Session: 12345678
C->S: SETUP rtsp://foo/twister/video1 RTSP/1.0
CSeq: 3
Transport: RTP/AVP;unicast;client_port=8002-8003
Session: 12345678
C->S: SWITCHSETUP rtsp://foo/twister/audio2 RTSP/1.0
CSeq: 4
Transport: RTP/AVP;unicast;client_port=8004-8005
Session: 12345678
Switch-control=client-initiated-only
C->S: SWITCHSETUP rtsp://foo/twister/video2 RTSP/1.0
CSeq: 5
Transport: RTP/AVP;unicast;client_port=8006-8007
Session: 12345678
Switch-control=client-initiated-only
C->S: SWITCHSETUP rtsp://foo/twister/audio3 RTSP/1.0
CSeq: 6
Transport: RTP/AVP;unicast;client_port=8008-8009
Session: 12345678
Switch-control=client-initiated-only
C->S: SWITCHSETUP rtsp://foo/twister/video3 RTSP/1.0
CSeq: 7
Transport: RTP/AVP;unicast;client_port=8010-8011
Session: 12345678
Switch-control=client-initiated-only
S->C: RTSP/1.0 200 OK
CSeq: 3
Transport: RTP/AVP;unicast;client_port=8002-8003;
server_port=9004-9005
Session: 12345678
Gentric [page 25]
Internet Draft RTSP Stream Switching January 2004
S->C: RTSP/1.0 200 OK
CSeq: 4
Transport: RTP/AVP;unicast;client_port=8004-8005;
server_port=9006-9007
Session: 12345678
S->C: RTSP/1.0 200 OK
CSeq: 5
Transport: RTP/AVP;unicast;client_port=8006-8007;
server_port=9008-9009
Session: 12345678
S->C: RTSP/1.0 200 OK
CSeq: 6
Transport: RTP/AVP;unicast;client_port=8008-8009;
server_port=9010-9011
Session: 12345678
S->C: RTSP/1.0 200 OK
CSeq: 7
Transport: RTP/AVP;unicast;client_port=8010-8011;
server_port=9012-9013
Session: 12345678
Then the client decides to start streaming the "default"
configuration at 220 kb/s (note that an non-agregate play would
also be possibility)
C->S: PLAY rtsp://foo/twister RTSP/1.0
CSeq: 8
Range: npt=0-
Session: 12345678
S->C: RTSP/1.0 200 OK
CSeq: 8
Session: 12345678
Then the client decides to switch streaming from 220 kb/s to 210
kb/s by switching audio streams
C->S: SWITCH rtsp://foo/twister/audio1 RTSP/1.0
CSeq: 9
Session: 12345678
Replace-with: rtsp://foo/twister/audio2
S->C: RTSP/1.0 200 OK
Gentric [page 26]
Internet Draft RTSP Stream Switching January 2004
CSeq: 9
Session: 12345678
Then the client decides to switch streaming from 210 kb/s to 60
kb/s by switching video streams
C->S: SWITCH rtsp://foo/twister/video1 RTSP/1.0
CSeq: 10
Session: 12345678
Replace-with: rtsp://foo/twister/video3
S->C: RTSP/1.0 200 OK
CSeq: 10
Session: 12345678