Internet DRAFT - draft-gentric-mmusic-stream-switching

draft-gentric-mmusic-stream-switching





    Internet Engineering Task Force                           MMUSIC WG
    Internet Draft
                                                      Philippe Gentric,
                                                    Philips Electronics
                                                    
                                                           January 2004
                                                      expires July 2004

   draft-gentric-mmusic-stream-switching-01.txt


                      RTSP Stream Switching




STATUS OF THIS MEMO

   This document is an Internet-Draft and is in full conformance 
   with all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups.  Note that 
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six 
   months and may be updated, replaced, or obsoleted by other 
   documents at any time.  It is inappropriate to use Internet-
   Drafts as reference material or to cite them other than as "work 
   in progress".

   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/1id-abstracts.txt

   To view the list Internet-Draft Shadow Directories, see 
   http://www.ietf.org/shadow.html.


Abstract

   Stream switching is a technique used to change the data rate of a 
   media being streamed, typically for the purpose of adaptation to 
   the effectively available bandwidth of the network. A backward 
   compatible and independent RTSP "SWITCH" command is proposed in 
   order to enable RTSP-based stream switching.






Gentric                                                      [page 1]

Internet Draft          RTSP Stream Switching           January 2004



1. Introduction

   Stream switching is a technique used to change the data rate of a 
   media being streamed, typically for the purpose of adaptation to 
   the effectively available bandwidth of the network.
   
   The aim is that a real time streaming system can switch from 
   stream to stream in order to vary the data rate. This requires 
   that the same content is encoded as multiple streams at various 
   bit rates. 
   
   This memo specifies an independent and backward compatible RTSP 
   extension enabling RTSP servers and clients to support stream 
   switching.
   
   Section 2,3 and 4 provide a detailed analysis of the problem, 
   section 5 is devoted to the proposed solution, section 6 list 
   open issues, section 7 is for security considerations.
   
1.1 Typical usage context

   The typical scenario is video distributed on demand, also known 
   as "Video On Demand" (VOD). The situation is depicted in figure 1. 
   This is the domain of RTSP [RTSP] servers. HTTP is typically used 
   for the service/application i.e. provides the entry point, 
   usually a RTSP URL. The media can be pre-recorded on file or can 
   be a "live" source in which case the RTSP/RTP server acts as a 
   relay.
   
   
   
             *****************                        *****************
             *               *        HTTP            *               *
             *  HTTP Server  *  <------------------>  *  HTTP Client  *
             *               *                        *               *
             *****************                        *****************
                
             *****************                        *****************
             *               *        RTSP            *               *
             *  RTSP Server  *  <------------------>  *  RTSP Client  *
             *               *                        *               *
             *****************                        *****************
                
             *****************                        *****************
             *               *      RTP on UDP UC     *               *





Gentric                                                      [page 2]

Internet Draft          RTSP Stream Switching           January 2004

             *  RTP Sender   *  ------------------->  *  RTP Receiver *
             *               *                        *               *
   media     *               *      RTCP SR           *               *
    on   --> *               *  ------------------->  *               *
   file      *               *                        *               *
    or       *               *     RTCP feedback      *               *
   live      *               *  <-------------------  *               *
             *****************                        *****************
   
   Figure 1: video on demand
   
   
1.2 Rate control issues   
   
   The typical usage of stream switching is for adaptation to the 
   effectively available end-to-end bandwidth i.e. rate (or 
   congestion) control.
   
   Specifically the streaming system (i.e. sender and receiver), 
   upon detection of variations in the effective bandwidth changes 
   the end-to-end data rate.
   
   This document does not address rate control algorithms i.e. the 
   way to compute a target bit rate based on some measurement of the 
   network. Actually rate control algorithms are an orthogonal 
   aspect of the problem addressed here which is the signalization 
   required inside a streaming system in order to perform switching.   
   
   It is assumed that specifications such as TFRC [TFRC] or work in 
   that area (see [Widmer], [Vojnovic], [Bansal]) should be used to 
   deal with this issue; there is a need to adapt the algorithms due 
   to the limited granularity of available rates when using stream 
   switching; we have experimental evidence that these issues are 
   manageable.

1.3 Content negotiation issues   
   
   Stream switching requires specific content negotiation taking 
   into account the possibility to change the configuration during 
   the session.
   
   Stream switching is actually bridging the gap between 
   "traditional" rate control (i.e. in TCP quasi-continuous changes 
   in the data rate) and "traditional" content negotiation where 
   sessions are negotiated for a constant data rate.

1.4 Seamless switching   





Gentric                                                      [page 3]

Internet Draft          RTSP Stream Switching           January 2004

   
   Seamless stream switching is obtained when the switch is 
   performed in such a fashion that media playback is minimally 
   disturbed. 
   
   A counter-example is when traditional content negotiation is 
   used: then after a given data rate is negotiated the session is 
   started and when it is obvious that the chosen data rate is not 
   acceptable (usually because it is too high) a new session is re-
   negotiated. The switch is not seamless because it takes time to 
   tear down the session and re-initialize another one. 
   
   Therefore seamless stream switching consists in preparing a set 
   of sessions (or a set of configurations within a session) and 
   providing a fast signaling mechanism so that the switch is 
   effectively instantaneous.
   
   A side effect of seamless switching (which is a "user" 
   requirement) is to minimally disturb the network i.e. it is a 
   well known congestion control issue that when congestion occurs 
   the fastest the response the minimal the perturbation will be.
   
1.5 Motivation for standardization
   
   Media delivery technologies are based on the availability of 
   extremely optimized constant bit rate encoders. On the other hand 
   IP networks that are being deployed for consumer access do not 
   have stable end-to-end bandwidth. These two facts cause major 
   problems for operators wishing to deploy "best-effort" streaming 
   services in a robust fashion, the current status-quo being that:
   
   . On one hand it is assumed that once a program is requested by a 
   client it will cause the server to send data toward the client at 
   the "nominal" constant rate, regardless of the state of the 
   network on the path from the server to the client. 
   
   . On the other hand there is a common assumption (see for example 
   [3GPP-BWS]) that stream switching is the way to improve this 
   situation; for that reason it is implemented and deployed in non 
   inter-operable ways by many vendors. 
  
   It is desirable to change this status-quo for several reasons. 
   
   Firstly the "constant rate method" causes problems whenever a hop 
   in the path from sender to receiver experiences congestion; These 
   problems are routing buffer overflows (and more specifically for 
   wireless networks: base station buffer overflows) and perceptible 





Gentric                                                      [page 4]

Internet Draft          RTSP Stream Switching           January 2004

   artifacts during playback, creating 3 populations of unhappy 
   people: network managers, end users watching the video and end 
   users for other traffic traveling on the same path!
   
   Secondly the control algorithms for proprietary stream switching 
   type of traffic are not publicly specified regarding congestion 
   control and therefore the deployment on large scales (i.e. 
   comparable to TV broadcast scales) of streaming services is not 
   proven to be safe in terms of network pathologies. 
   
   Thirdly the various parties involved in streaming media 
   commercial deployments i.e. content providers, network operators 
   and various technology providers -being aware of these issues-
   are stalling, thereby compromising the immediate future of media 
   distribution and as a direct consequence of various new consumer 
   high data rate network deployments in both the wired and wireless 
   domains.
   
   Note that wireless network technologies are especially sensitive 
   to these issues because of the inherent variability of the radio 
   bandwidth, which has triggered attention and efforts in 3GPP (see 
   for example [3GPP-alt-attr] and [3GPP-BWS]) on these issues.
   
   In conclusion there is a pressing demand for inter-operable 
   solutions and also a demand for solutions that -being standard-
   have a better defined and/or more transparent behavior. 
   
   In short the goals of a standard framework for stream switching 
   would be:
   
   . To enable the emergence of more advanced inter-operable 
   streaming products.
   
   . To enable the advent of technical and/or commercial 
   specifications for streaming products and services that have a 
   well characterized behavior regarding bandwidth management.
      

1.6 Requirements

   The key requirement is that the user experience should be the 
   best possible which means that switching must be seamless. This 
   requirement implies very specific timing constraints on the way 
   the switch is performed, typically the sender should stop sending 
   one stream and start sending the other stream at exactly the same 
   media time and same real time, otherwise problems will occur in 
   terms of buffer at the receiver side, since on the client side 





Gentric                                                      [page 5]

Internet Draft          RTSP Stream Switching           January 2004

   the buffers must maintain a stable amount of media time (a media 
   decoder is paced in terms of media time i.e. 1 second of media is 
   decoded in 1 second of elapsed time). Specifically in stream 
   switching the challenge is to avoid buffer underflows where the 
   decoder pauses playback and displays the infamous "re-buffering" 
   message.
   
   The simple consequence is that the streaming data source must be 
   informed as soon as possible that it needs to change its output 
   rate, otherwise it will keep on sending at the same excessive 
   rate, which will result in:
   
   . Filling up more buffers in the network devices upstream from 
   the limiting hop, thereby amplifying the congestion. 
   
   . If one or more network devices arrive to the point of 
   saturation this will cause losses not only in the media stream in 
   question but also in other traffic (constant data rate UDP 
   traffic is known to be "aggressive" in this context since TCP 
   traffic will automatically fall back).
   
   . It will delay the instant when the new stream would reach the 
   decoder, thereby increasing the chances that the decoder actually 
   runs its buffer down to underflow.
   
   The ability to be able to increase the rate -when the available 
   bandwidth increases and with due care to congestion control- is 
   also a requirement; it has much less stringent technical 
   implications. Actually having a really seamless switch is then 
   possible in all cases.
    
1.7 Vocabulary

   We define a "program" as a set of "tracks", for example a movie 
   is composed of an audio and a video track. We define a "stream" 
   as an encoded instance of a track, for example the video track of 
   a movie may be encoded at 50kb/s, 150 kb/s and 400 kb/s using 
   respectively H263 baseline SQCIF 7.5 fps, MPEG-4 SP@L3 QCIF 15 
   fps and MPEG-4 ASP@L3 CIF 30 fps, the audio track may be encoded 
   at 5 kb/s, 20 kb/s, 48 kb/s and 80 kb/s using respectively AMR, 
   AMR WB, AAC mono and AAC stereo.
   
   We define one "flavor" of a program as a given set of streams (a 
   pair for a movie, usually consisting in audio and video), for 
   example 400 kb/s video and 80 kb/s AAC is the high quality flavor 
   in the example above for which we have 12 different flavors (but 
   some flavors may not always make sense).





Gentric                                                      [page 6]

Internet Draft          RTSP Stream Switching           January 2004

      
   We define a "switch-set" as the set of all the streams for a 
   track or a program. A switch-set can be organized either as 
   ordered first by track or first by flavor. Obviously switch-sets 
   are prepared during the content production or deployment phase. 
  
2. Seamless stream switching technical issues

2.1 Configuration issues

   When streams are switched there are 2 fundamental cases: either 
   the streaming configuration changes or it does not change. 
   
2.1.1 No configuration change
   
   The case when the streaming configuration does not change is the 
   most simple. 
   
   This case can also be described as "nothing changes but the 
   bit rate". Many codecs actually support this "natively"; for 
   example video codecs and recent speech codecs (AMR, EVRC) as well 
   as recent music codecs (AAC in some modes) have the property that 
   they can decode instantly-variable-bit-rate streams. 
   
   One important thing to note is that in RTP terms the Payload Type 
   remains the same. Specifically the RTP session remains the same.
   
   For these reasons this mode is also called "client-transparent" 
   since (in theory) the source can switch without forewarning the 
   client.
   
   Care must be taken however that some player implementations may 
   actually be sensitive to sudden bit rate changes, or may prefer 
   to be warned/notified about them.

2.1.2 Configuration change
   
   The second case, when the streaming configuration changes or even 
   when the codec itself changes is more complex because:
   
   . In the general case one has to assume that the client has to 
   instantiate (or invoke) 2 (or more) completely different 
   (hardware and/or software) codecs, rendering systems and network 
   reception stacks (or at least different payload processors). 
   Obviously this may involve substantial processing and/or 
   buffering resources. These are implementation details out of the 
   scope of this memo, however an important rule derives from this: 





Gentric                                                      [page 7]

Internet Draft          RTSP Stream Switching           January 2004

   servers MUST NOT switch streams involving a codec configuration 
   change but upon reception of an explicit request from the 
   receiver or with an explicit prior agreement. Also authentication 
   should be used for these requests (see the security consideration 
   section).
   
   . In the general case feeding a codec with a stream for a 
   different codec, or a different configuration can crash a 
   decoder. Therefore there must be a error-proof way to signal the 
   change at the packet or encoded frame granularity. Fortunately 
   RTP does have such a capability with the Payload Type field.
   
   . In the general case changing codec (from example from AMR to 
   AAC) also involves changing the RTP payload format. Fortunately 
   this is also covered by the RTP Payload Type field.

   The important thing to note is that in SDP and RTP terms the 
   Payload Type has to be different for these type of 
   configurations.
   
   This is called "client non-transparent" stream switching.

2.1.3 Mixed transparency configurations

   It is typical that a given switch-set mixes both client non-
   transparent" and "client transparent" modes. 
   
   One could wish that the client-transparent mode would be 
   "enough"... however "client-transparent" switches usually do not 
   cover as wide a bandwidth range as "client non-transparent" ones 
   due to the bit rate range of each specific codec.
   
   For example a service deployed for CD-quality music using stereo 
   AAC cannot go below 32 kb/s in the client-transparent mode 
   because AAC does not go below this bit rate. On the other hand a 
   client non-transparent switch involving a speech codec (say AMR) 
   enables to define "fall back" streams with as little as 4 kb/s.

2.2 Codec access points

   For all streams it is possible to "switch out" at any point, 
   however some streams (video is typical) cannot be "switched in" 
   at any points, typically these codecs have several types of frame 
   regarding random access. Some are full random access points 
   (typically I frames, or S frames for recent codecs such as H264) 
   others have other types of partial random access points frames 
   such as frames mixing I macro-blocks and P macro-blocks etc. 





Gentric                                                      [page 8]

Internet Draft          RTSP Stream Switching           January 2004

   
   From the point of view of the decoder these can be seen as 
   implementation details i.e. if a server switches at a non random 
   access point the client should be able to detect it and act in 
   relevance with its capability to handle it. Indeed from the 
   decoder point of view a stream switch on a non-random-access-
   point is similar to receiving packets after a loss.
   
   It could be useful however that a client could indicate to the 
   server that it prefers a switch at a random access point.

2.3 The control issue

   The first key question is to understand if the decision to switch 
   is taken by the receiver or by the sender.

2.3.1 Server initiated switch 

   Server initiated switch has 4 major advantages
   
   . It can be made to work in a similar fashion for all scenarios.
   
   . It resembles more the TCP situation improving the chances that 
   some of the considerable know-how acquired with TCP in terms of 
   congestion/rate control can be reused.
   
   . It makes more sense to have other source of information about 
   the status of the network than the receiver(s). For example 
   routers on the path may be able to issue congestion notifications 
   much earlier than if one must wait for the perturbation to reach 
   the final destination and feedback signals to travel back (see 
   also [TRIGTRAN]).
   
   . It allows one very simple "catastrophe prevention" mechanism: 
   Since the sender does not need to warn the receiver before 
   switching the sender can decide to switch down when feedback from 
   the receiver has not been received for a given amount of time 
   (TRFC uses in the order of 4 RTTs).
      
   As has been discussed above the server can decide to switch 
   without telling the receiver only in 2 cases: 
   
   . if the decoder configuration does not change. In the context of 
   SDP/RTP this means that the Payload Type must not change. For 
   this type of configuration the existing set of IETF 
   specifications is usable in terms of session description and 
   management, specifically "normal" RTCP can be used to send 





Gentric                                                      [page 9]

Internet Draft          RTSP Stream Switching           January 2004

   feedback and it can be seen as a server implementation issue that 
   the server decides to switch based on client RTCP feedback. There 
   could be a need to document this, maybe not as a standard 
   specification but surely as "practice" in inter-operability 
   forums.
   
   . if there was a prior agreement. In the context of SDP/RTP this 
   means that the client has "instantiated" several "stacks" (one 
   for each flavor of each stream) and is ready to receive data on 
   each of these channel (by channel one means that either or both 
   the destination UDP port and the Payload Type differ). This means 
   that possibly substantial resources must be pre-allocated on the 
   receiver side.  This is wasteful in case the network behaves so 
   that the session runs entirely with the initial streams or uses 
   only a fraction of these resources. Obviously a little signaling 
   could help here... Note also that although these streams have 
   different Payload Types this signaling may not be early enough...
   
2.3.2 Player initiated switch

   The player can also initiate the switch and using RTSP is the 
   obvious choice.
   
   We will see however that the existing RTSP specification needs to 
   be extended in order to provide seamless stream switching. 

3. Description of the switch-set

   Clearly there is a need to convey a description of the switch-set 
   to the client. There are several ways to perform that, that we 
   will describe now.
   
3.1 SDP Description of the switch-set
  
   One way to describe the alternative flavors for each stream 
   composing a program is to list them using SDP, an example of one 
   such description is given in Appendix.
   
   For client initiated switches there is a need to convey the 
   bandwidth of a stream, but this is already available.
   
   Otherwise the exact SDP syntax to use in order to describe that 
   streams are alternatives of a given track (media) is debatable; 
   SDP has several extensions that can be considered [grouping], 
   also new extensions are a possibility, 3GGP has specified one 
   such SDP syntax for its Release 6 (see [3GPP-alt-attr]).






Gentric                                                      [page 10]

Internet Draft          RTSP Stream Switching           January 2004

3.2 SMIL Description of the switch-set
  
   SMIL is a scene description language [SMIL]. In SMIL the "switch" 
   element allows an author to specify a set of alternative elements 
   from which only the first acceptable element is chosen. Actually 
   the SMIL specification specifies that the bit rate is one typical 
   thing that would change among streams in a switch element. 

   In short the SMIL element "switch" provides a standard way to 
   declare to the client all the possible "flavors" of each stream.
   
   However SMIL 2.0 supports only parse-time evaluation i.e. it 
   basically assumes that the evaluation of which stream to use is 
   done once. Furthermore even when dynamic re-evaluation will be 
   specified in future versions, SMIL will typically not specify how 
   the switching should be performed.

   In conclusion the SMIL switch syntax element is a building block 
   that could very nicely complement an IETF specification of how to 
   perform stream switching at the transport (and transport control) 
   level.

3.3 MPEG-4 system Description of the switch-set

   MPEG-4 [MPEG-4] provides a way to describe alternative streams. 
   However since this type of manipulation would be performed from 
   the context of a terminal implementing the MPEG-4 system 
   specification it is a priori out of the scope of this memo.

4. Switching control

4.1 Switching by changing the RTSP session

   One way to perform stream switching is to use RTSP TEARDOWN in 
   order to destroy the session and then restart another one. 
   Unfortunately this method involves several round trips which will 
   typically cause playback to stop, in short it is practically 
   impossible to make it seamless. For that reason this method -
   although "it works"- will not be discussed further.

4.2 Switching within the same RTSP session

   One way to perform switching at the session level is to enable 
   the definition of a "switchable session" i.e. an extended session 
   that is negotiated as containing all alternative streams from the 
   very start. 






Gentric                                                      [page 11]

Internet Draft          RTSP Stream Switching           January 2004

   Using RTSP has the following advantages:
   
   . The method is completely independent of the codec capabilities.
   
   . It directly provides both content and capability negotiation as 
   well as control.
   
   . It inherits all RTSP (and therefore HTTP) security features.
   
4.3 Switching using RTSP PLAY/PAUSE

   The usage of PLAY/PAUSE command for stream switching would be as 
   follows:
    
   At the time of session negotiation the client and server prepare 
   to stream all the variants in the switch-set but PAUSE all 
   streams except one per media type. Switching is performed by 
   issuing simultaneously a PAUSE command on the stream being 
   switched out and an PLAY command on the stream being switched in. 
   
   Unfortunately doing that involves a trick where the client must 
   specify the pause point (see the RTSP PAUSE specification for 
   detail [RTSP]). But then finding out the appropriate time to use 
   as "pause point" is not a trivial issue at all. For this reason 
   this method cannot be used either.

4.4 Switching using RTSP MUTE/UNMUTE

   An extension to RTSP called MUTE/UNMUTE has been proposed [RTSP-
   MUTE] . It defines MUTE and UNMUTE as 2 additional optional RTSP 
   commands. MUTE enables a client to request the server to stop 
   sending data for a given stream and in this respect is similar to 
   PAUSE. However UNMUTE requests the server to resume sending data, 
   not at the point in media where MUTE was issued, but at a point 
   of time synchronous with the media streams that were being still 
   streamed.
   
   The usage of this command for stream switching would be as 
   follows: at the time of session negotiation the client and server 
   prepare to stream all the variants in the switch-set but MUTE all 
   streams except one per media type. Switching is performed by 
   issuing simultaneously a MUTE command on the stream being 
   switched out and an UNMUTE command on the stream being switched 
   in.
   
   The drawback is that for each "atomic" switch two commands have 
   to be issued.





Gentric                                                      [page 12]

Internet Draft          RTSP Stream Switching           January 2004

   
   Also this does not cover the need for additional signalization as 
   detailed above.
   
4.5 Switching using RTSP SET_PARAMETER

   SET_PARAMETER and even OPTIONS has been evoked as candidates for 
   client-initiated stream switching (see [3GPP-BWS]).
   
   A possible syntax would be:
   
     C->S: SET_PARAMETER rtsp://foo/twister/audio1 RTSP/1.0
           CSeq: 421
           Content-length: xx
           Content-type: application/stream-switching
           Replace-with: rtsp://foo/twister/audio2

     S->C: RTSP/1.0 200 OK
           CSeq: 421
 
   The motivation is that SET_PARAMETER has been designed to provide 
   some type of extensibility to RTSP, the drawback however is that 
   it is not an explicit command.
 
   Also this does not cover the need for additional signalization as 
   detailed above.

5. Proposed specification

   The proposal is to introduce new RTSP Methods specifically for 
   stream switching.

   As indicated in [RTSP section 1.5] the advantage of a new Method 
   by comparison with extending an existing method is that a 
   component that does not know the new method will reply with "501 
   not implemented" which makes backward compatibility issues easy 
   to solve. Furthermore there is a need for additional Header-
   fields as described below that are best introduced for new 
   Methods.
   
   Also it is desirable that this specification should be as 
   independent as possible of the RTSP specification and of its 
   evolutions (with a required side effect of having backward 
   compatibility with [RTSP]).
   
   For that reason this memo defines stream switching primitives 
   that are orthogonal to the rest of RTSP in terms of state machine 





Gentric                                                      [page 13]

Internet Draft          RTSP Stream Switching           January 2004

   and signaling. This specification does not modify the syntax or 
   semantic of any RTSP Method or Headers and the stream switching 
   state machine is defined as being "inside" each state of the RTSP 
   state machines in both the client and server. For example streams 
   can be switched during a PAUSE as well as during a PLAY, etc. 
   (Note that there is an exception to that principle for 
   SWITCHCLOSE issued on a playing stream, see below)
   
   All the stream switching methods are OPTIONAL but it is 
   RECOMMENDED to implement all of them. For example the attention 
   of the implementer is attracted on the usefulness of SWITCHCLOSE.

5.1  SWITCHSETUP

5.1.1 SWITCHSETUP rationale

   Introducing SWITCHSETUP is better than re-using SETUP in the 
   respect that it is explicitly for stream switching purposes. 
   
   It is also highly desirable that a stream-switching enabled 
   player can connect to a "old" RTSP server (that does not 
   implement stream switching). Therefore it is desirable that the 
   behavior of existing servers is fully defined. For that reason 
   SWITCHSETUP is useful in the respect that an "old" server will 
   refuse it, clearly indicating to the client that it does not 
   support stream switching. In this case SDP files describing 
   switch-sets can also be used with "old" servers.
     
5.1.2 SWITCHSETUP specification

   The SWITCHSETUP Method is similar to SETUP except that it 
   explicitly tells the server that the corresponding stream is 
   part of a switch-set.
   
   For maximum backward compatibility a client MUST use SETUP for 
   the primary streams and SWITCHSETUP for the alternative streams. 
   This way a server that does not support stream switching will 
   reply "501" to SWITCHSETUP but will SETUP the primary streams (a 
   possible alternative -if SETUP was used for all streams- being a 
   server allocating a lot of resources for a function that it 
   cannot perform!).
   
   SWITCHSETUP may be issued at anytime during a RTSP session.  
   
   SWITCHSETUP issued on a playing stream is similar to SETUP.
   
   





Gentric                                                      [page 14]

Internet Draft          RTSP Stream Switching           January 2004

5.1.3 SWITCHSETUP "Switch-control" header field
   
   The SWITCHSETUP Method has an OPTIONAL header field: "Switch-
   control"
   
   The Switch-control request-header field can be used to specify to 
   the server how the client supports stream switching control.
   
   The values below are mutually exclusive.
   
   "Switch-control=client-initiated-only": Tells the server that it 
   MUST NOT switch on its own but only upon reception of a client-
   to-server SWITCH command. This is relevant for any type of 
   switch, including client-transparent switches.
   
   "Switch-control=non-transparent-client-initiated-only": Tells the 
   server that it MUST NOT switch on its own but only upon reception 
   of a client-to-server SWITCH command for non-client-transparent 
   switches. Specifically the server CAN switch on its own for 
   client-transparent switches. This is the default i.e. a server 
   MUST assume this value for absent or malformed Switch-control 
   header fields.
   
   "Switch-control=server-initiated-ok": Tells the server that it 
   CAN switch on its own without warning the client first for all 
   types of switches (i.e. the client has allocated all the 
   necessary resources).
   
   "Switch-control=forewarning: 2000": Tells the server that it CAN 
   switch on its own but that then it MUST warn the client by using 
   a SWITCHSIGNAL (see below) and that this forewarning MUST be sent 
   at least 2000 milliseconds before the server performs the switch. 
   This is relevant for any type of switch, including client-
   transparent switches.

   "Switch-control=non-transparent-forewarning: 2000": Tells the 
   server that it CAN switch on its own but that for non-transparent 
   switches it MUST warn the client by using a SWITCHSIGNAL (see 
   below) and that this forewarning MUST be sent at least 2000 
   milliseconds before the server performs the switch.


5.1.4 SWITCHSETUP "RAP" header field
  
   The SWITCHSETUP Method has an OPTIONAL header field: "RAP"
   
   The RAP request-header field can be used to specify to the server 





Gentric                                                      [page 15]

Internet Draft          RTSP Stream Switching           January 2004

   how the client supports stream switching regarding Random Access 
   Points.
   
   The values below are mutually exclusive.   
   
   "RAP=RAP-only": Tells the server that it MUST switch only on 
   Random Access Point (in the "new" stream). For SWITCH requests 
   corresponding to drastic (more than 50%) rate reduction i.e. in 
   case rapid action against congestion is preferable to smoother 
   playback, servers MUST then interrupt the on-going stream 
   immediately and restart streaming at the next available RAP in 
   the new stream (which effectively creates a gap in the stream).
   
   "RAP=indifferent": Tells the server that it CAN switch at any 
   point (in the new stream). This is the default i.e. a server 
   SHOULD assume this value for absent or malformed RAP header 
   fields.
   
   "RAP=if-before:300": Tells the server that it SHOULD wait to 
   switch on a Random Access Point (in the new stream) unless such a 
   point is not available in less than 300 milliseconds of Normal 
   Play Time, in which case the server MAY switch at any point. 
   Servers MUST ignore this recommendation for SWITCH requests 
   corresponding to drastic (more than 50%) rate reduction i.e. in 
   case rapid action against congestion is preferable to smoother 
   playback.
   
5.2 SWITCH

   The "SWITCH" Method is an OPTIONAL atomic command from the client 
   to the server requesting the server to switch from one stream to 
   another.
   
   The stream to switch off is indicated as a parameter of the 
   Method. The stream to switch on is indicated with the Header 
   Field "Replace-with" as shown in the example below:
   
   C->S: SWITCH rtsp://foo/twister/audio1 RTSP/1.0
           CSeq: 421
           Replace-with: rtsp://foo/twister/audio2
           
     S->C: RTSP/1.0 200 OK
           CSeq: 421
           Range: smpte=0:10:22-;time=19970123T153600Z
           RTP-Info: url=rtsp://foo/twister/audio2;
               seq=12312232;rtptime=78712811
               





Gentric                                                      [page 16]

Internet Draft          RTSP Stream Switching           January 2004

   See the Appendix for a fully detailed example.
   
   The "Replace-with" Header Field may be absent or empty signaling 
   that the target stream should be stopped with no replacement, a 
   symmetric SWITCH with an empty target can be used to restore the 
   corresponding track (this is useful in order to temporarily 
   suppress the video in order to reach a very low bit rate for 
   example with a news service on a mobile device, in that case 
   SWITCH is equivalent to the MUTE command of [RTSP-MUTE].
   
   SWITCH requests MAY be issued at any time during a RTSP session 
   (including before the acknowledgement of a previous request is 
   received). When receiving several SWITCH requests a server SHOULD 
   ignore/abandon the oldest ones. In all cases a server MUST 
   execute as fast as possible requests producing a smaller data 
   rate (the smallest if several requests are pending). A server MAY 
   delay or deny the execution of requests corresponding to higher 
   data rates, for example if it has reached its maximum capacity. A 
   server SHOULD NOT deny SWITCH request for smaller rates.
           
   The server response to a SWITCH from a player SHOULD contain the 
   same information as the answer to PLAY. Note for example that the 
   use of RTP-info as in the above example allows instantaneous lip-
   sync (the alternative being that the player must wait for the 
   RTCP Sender Report) and also may help the receiver to identify 
   the exact packet corresponding to the new stream (especially in 
   client-transparent cases), which in turn is useful for resetting 
   traffic monitoring computations, etc.
   
5.3 SWITCHSIGNAL

   As its name hints, SWITCHSIGNAL is a "signal" rather than a 
   command. SWITCHSIGNAL is an OPTIONAL server signal to the client 
   that a switch will soon be (or is being) performed. The stream to 
   be switched off is indicated as a parameter of the Method. The 
   stream to be switched on is indicated in RTP-Info as shown in the 
   example below:


     S->C: SWITCHSIGNAL rtsp://foo/twister/audio1 RTSP/1.0
           CSeq: 4213
           Range: smpte=0:10:22-;time=19970123T153600Z
           RTP-Info: url=rtsp://foo/twister/audio2;
             seq=12312232;rtptime=78712811

     C->S: RTSP/1.0 200 OK
           CSeq: 4213





Gentric                                                      [page 17]

Internet Draft          RTSP Stream Switching           January 2004


   It is RECOMMENDED that the server SHOULD issue SWITCHSIGNAL as 
   soon as possible before the actual switch and adds all possible 
   information in it (range, RTP-info etc) as in response to PLAY.
   
   For the client-transparent case SWITCHSIGNAL is normally not 
   necessary for the correct behavior of the streaming system but 
   client may register the need to receive such notification (see 
   SWITCHSETUP above). 
   
   For the non-client-transparent case the server MUST respect the 
   instructions provided by the client in the SWITCHSETUP commands 
   about the need to issue SWITCHSIGNAL since -unless "Switch-
   control=server-initiated-ok" was explicitly signaled- a server-
   initiated switch without forewarning would typically cause the 
   client to produce degraded playback or can even crash it.
              
5.4  SWITCHCLOSE

5.4.1 SWITCHCLOSE rationale

   It is highly desirable that a stream-switching enabled player can 
   free non-used resources in order to allocate other resources.
   
   A typical example is a session nominally at 10 Mb/s for which a 
   large number of alternative streams are available (say 50 
   different bit rates all the way from high quality HDTV with 5+1 
   music down to stamp-sized video with mono speech "backup" 
   configuration).
   
   In such a case a typical usage would be that the client would 
   SWITCHSETUP only a few alternatives (say 8 Mb/s, 5 Mb/s, 1 Mb/s) 
   which could involve a substantial amount of memory in case these 
   configurations are supported using different codecs, etc.
   
   If the network condition degrades catastrophically this player 
   may need to allocate other resources in order to switch to lower 
   bit rates. In this case it would be highly valuable that it can 
   free (some of) the resources corresponding to the highest bit 
   rates.
     
   It is also highly desirable that a server can free resources 
   implicitly allocated after accepting a SWITCHSETUP (including 
   for DOS resistance); but then it is very useful to tell the 
   player that hypothetical corresponding SWITCH requests would be 
   denied.
     





Gentric                                                      [page 18]

Internet Draft          RTSP Stream Switching           January 2004

5.4.2 SWITCHCLOSE specification
 
   SWITCHCLOSE tears down the resources corresponding to a given 
   SWITCHSETUP identified by the (same) target URL (as used in 
   SWITCHSETUP).

   SWITCHCLOSE is OPTIONAL.

   SWITCHCLOSE can be issued by a server or by a client. 
   
   SWITCHCLOSE MAY be issued at anytime during a RTSP session.
      
   SWITCHCLOSE issued on a playing stream causes the corresponding 
   track to be stopped i.e. only a PLAY can restore this track and a 
   SWITCHSETUP is required to restore the stream as a possible 
   future alternative. A player SHOULD NOT issue SWITCHCLOSE on a 
   playing stream, PAUSE or SWITCH SHOULD be first issued for that 
   stream. However SWITCHCLOSE MAY be used by a server on a playing 
   stream in order to signal that this stream is been terminated and 
   will not be resumed unless the client takes explicit action.
   
   Example:
   
     C->S: SWITCHCLOSE rtsp://foo/twister/audio1 RTSP/1.0
           CSeq: 42134

     S->C: RTSP/1.0 200 OK
           CSeq: 42134
   
5.5  SDP rules

   A SDP describing a switch-set MUST use different (dynamic) 
   Payload Type for streams that are not client-transparent 
   switchable.
   
   A SDP describing a switch-set MAY use identical (dynamic) Payload 
   Type for streams that are client-transparent switchable.
   
   A SDP describing a switch-set MAY use identical port numbers for 
   streams that are client-transparent switchable.
   
6. Open issues

6.1 SDP issues

   Is there a need for additional SDP syntax and/or rules to 
   describe the switch-set? (or is the example in Appendix OK?)





Gentric                                                      [page 19]

Internet Draft          RTSP Stream Switching           January 2004


   Should it be actually RECOMMENDED (or even a MUST?) to reuse the 
   same (dynamic) payload type for alternate streams of the "client-
   transparent" type? 
   
6.2 Other issues

   Status codes: additional status codes may be necessary(?). For 
   example when switching has not been performed because a more 
   recent request arrived...or because max capacity is reached?
  
   Stream Switching should work for RTP interleaved inside RTSP?
   
   Is there an alternative to doing one SETUP per alternate stream? 
   Would it be worth the trouble to define a specific syntax?
   
   In the client-transparent mode assuming neither the payload type 
   nor the port number change it should not be necessary to make 
   one SETUP per stream (right?), shall it be documented/mandated? 

   Are there specific firewall/proxy considerations?
   
6.3 UDP transport of switching command
   
   It is a good idea to also provide a UDP based command. The key 
   motivation of doing that is that UDP feedback may be faster and 
   as mentioned earlier speed is a key factor for optimal congestion 
   control as well as switch seamless-ness.
   
   Should this be done using an RTCP extension? Or use "rtspu"? (but 
   isn't rtspu going to be dropped?)
   
   For security (UDP being easier to spoof than TCP?) this could be 
   restricted to "down" switch since for congestion control 
   purposes there is never any hurry to switch up? could be 
   restricted to the client-transparent case?
   
6.4 Independence with RTSP parallel evolution

   There is a possible exception to that for SWITCHCLOSE issued on a 
   playing stream. But it looks like a very logical one?

7. Security considerations

   The security issues associated with stream switching are those 
   inherent to the usage of RTP and RTSP plus:
   





Gentric                                                      [page 20]

Internet Draft          RTSP Stream Switching           January 2004

7.1 Induced server misbehavior   

   The following threats can be identified:
   
   . Causing the server to allocate a lot of resources (in making 
   ready for supporting switching for a large switch-set). Note 
   however that a server can deny SWITCHSETUP requests using for 
   example "503 Service Unavailable" (temporary) or "416 Requested 
   Range Not Satisfiable" (permanent) and can issue SWITCHCLOSE at 
   anytime. Also the server is often the source of the SDP (via 
   DESCRIBE) and therefore has opportunities there to reduce the 
   diversity.
   
   . Causing the server to switch up toward high bit rate streams 
   can create large amounts of network traffic. Note however that 
   the typical usage of stream switching is anyway to deploy the 
   service with the maximum bit rate as a primary target...With 
   stream switching streaming servers would actually become 
   bandwidth control tools for operators.
   
   . Causing the server to switch down toward low bit rates causes a 
   degraded service. 
   
   . Causing the server to frequently switch is a source of degraded 
   service but is also a Denial Of Service Attack in the sense that 
   it would typically cause the server to consume substantial 
   resources in switching, thereby reducing the service capacity 
   for example by reducing the maximum number of concurrent streams 
   that the server can serve or the maximum total throughput of the 
   server, etc. The defense of a server is probably to refuse too 
   frequent switches and especially upward switches...
   
   These threats are fended off by applying authentication to the 
   stream switching control messages. RFC2326 section 16 provides 
   guidance on how to perform that with RTSP. 
   
   Also server implementations SHOULD include configurable 
   limitations such as a maximum number of switches per amount of 
   time per media track, a maximum number of alternate streams per 
   client, etc.
   
7.2 Induced client misbehavior   
   
   One threat is that a server could cause the receivers to 
   misbehave (or crash) for example if the data sent is encoded with 
   a different decoder configuration than the one the player was 
   initialized with.  





Gentric                                                      [page 21]

Internet Draft          RTSP Stream Switching           January 2004

   
   For that reason this specification makes special care that 
   server-initiated switches are possible only for agreed upon 
   streams (using SWITCHSETUP) and either for client-transparent 
   switches (and a client can disable these anyway) or in conditions 
   specified by the client with "safe" defaults.
  
8. Acknowledgements

   The author wishes to thank Alain Teil, Kamal Rada, Yves Ramanzin 
   and Nicolas Delahaye for all the fruitful discussions and 
   comments.

9. References
   
   [Widmer]       A survey on TCP-Friendly Congestion Control, J. 
   Widmer, R. Denda, M. Mauve, IEEE Network May-June 2001, 
   http://www.informatik.uni-
   mannheim.de/informatik/pi4/publications/library/Widmer2001a.pdf
   
   [Vojnovic]     One the long-run behavior of equation-based rate 
   control, M. Vojnovic, J.Y. Le Boudec, Proceedings of SIGCOMM'02, 
   August 19-23 2002, Pittsburg, Pensylvania, USA, 
   http://www.acm.org/sigcomm/sigcomm2002/papers/equation.pdf
   
   [Bansal]       Dynamic Behavior of Slowly-Responsive Congestion 
   Control Algorithms, D. Bansal, H. Balakrishnan, S. Floyd, S. 
   Shenker, Proceedings of SIGCOMM'01, August 27-31 2001, San Diego, 
   California, USA, http://www.acm.org/sigcomm/sigcomm2001/p21-
   bansal.pdf
   
   [RTP]           http://www.ietf.org/rfc/RFC1889.txt
      
   [RTSP]          http://www.ietf.org/rfc/RFC2326.txt
   
   [HTTP]          http://www.ietf.org/rfc/RFC2616.txt
      
   [grouping]      http://www.ietf.org/rfc/RFC3388.txt

   [TFRC]          http://www.ietf.org/rfc/RFC3448.txt

   [SMIL]          http://www.w3.org/TR/smil20/cover.html
   
   [MPEG-4]        http://mpeg.telecomitalialab.com/standards/mpeg-
   4/mpeg-4.htm
   
   [3GPP-alt-attr] 





Gentric                                                      [page 22]

Internet Draft          RTSP Stream Switching           January 2004

   http://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_22/Docs/S4-
   020407.zip
   
   [3GPP-BWS] 
   http://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_25/Docs/S4-
   030024.zip   

   [RTSP-MUTE]     http://www.ietf.org/internet-drafts/draft-
   sergent-rtsp-mute-00.txt
            
   [TRIGTRAN]      http://www.ietf.org/internet-drafts/draft-
   dawkins-trigtran-probstmt-00.txt
   
9. Authors' Addresse

   Philippe Gentric 
   Philips Software 
   51 rue Carnot 
   92156 Suresnes 
   France 
   e-mail: philippe.gentric@philips.com 

      
   
  
Appendix: Detailed example
   
   C->S: DESCRIBE rtsp://foo/twister RTSP/1.0
           CSeq: 1

   Server replies with the full content description: there are 3 
   video streams, at 200 kb/s, 100 kb/s and 50 kb/s there are 3 
   audio streams, at 20 kb/s, 10 kb/s and 5 kb/s

   NB: the example is invalid in the respect that normally it would 
   require more detail such as decoder configurations which are 
   omitted for the sake of simplicity ...
   
   S->C: RTSP/1.0 200 OK
           CSeq: 1
           Content-Type: application/sdp
           Content-Length: xxx

           v=0
           o=- 2890844256 2890842807 IN IP4 172.16.2.93
           s=RTSP Session
           i=An Example of RTSP Session Usage for Stream Switching





Gentric                                                      [page 23]

Internet Draft          RTSP Stream Switching           January 2004

           a=control:rtsp://foo/twister
           t=0 0
           
           m=video 7722 RTP/AVP 96
           a=rtpmap:96 MP4V-ES/1000
           a=control:rtsp://foo/twister/video1
           b=AS:200

           m=audio 7724 RTP/AVP 97
           a=rtpmap:97 mpeg4-generic/44100/2
           a=control:rtsp://foo/twister/audio1
           b=AS:20

           m=video 7726 RTP/AVP 98
           a=rtpmap:98 MP4V-ES/1000
           a=control:rtsp://foo/twister/video2
           b=AS:100

           m=audio 7724 RTP/AVP 99
           a=rtpmap:99 mpeg4-generic/44100/2
           a=control:rtsp://foo/twister/audio2
           b=AS:10
           
           m=video 7726 RTP/AVP 100
           a=rtpmap:100 MP4V-ES/1000
           a=control:rtsp://foo/twister/video3
           b=AS:50

           m=audio 7724 RTP/AVP 101
           a=rtpmap:101 mpeg4-generic/44100/2
           a=fmtp:101 streamtype=5; profile-level-id=15; mode=AAC-hbr
           a=control:rtsp://foo/twister/audio3
           b=AS:5

   The second set is SETUP where client and server agree on the 
   transport parameters (UDP port numbers etc). Note that the client 
   waits for the reply to the first SETUP in order to have the 
   session number and then sends all the SWITCHSETUPs in rapid 
   succession so that this operation takes approximately 2 round 
   trips independently of the number of streams.
   
   In this example different UDP ports are used but the same port 
   could also be reused since by rule the switch is either performed 
   on streams that are of the client-transparent type or that have a 
   different payload type.
   
   Note the "Switch-control=client-initiated-only" header field 





Gentric                                                      [page 24]

Internet Draft          RTSP Stream Switching           January 2004

   which signals to the server that it MUST NOT switch on its own 
   but only upon reception of a SWITCH command.

     C->S: SETUP rtsp://foo/twister/audio1 RTSP/1.0
           CSeq: 2
           Transport: RTP/AVP;unicast;client_port=8000-8001

     S->C: RTSP/1.0 200 OK
           CSeq: 2
           Transport: RTP/AVP;unicast;client_port=8000-8001;
                      server_port=9000-9001
           Session: 12345678

     C->S: SETUP rtsp://foo/twister/video1 RTSP/1.0
           CSeq: 3
           Transport: RTP/AVP;unicast;client_port=8002-8003
           Session: 12345678

     C->S: SWITCHSETUP rtsp://foo/twister/audio2 RTSP/1.0
           CSeq: 4
           Transport: RTP/AVP;unicast;client_port=8004-8005
           Session: 12345678
           Switch-control=client-initiated-only

     C->S: SWITCHSETUP rtsp://foo/twister/video2 RTSP/1.0
           CSeq: 5
           Transport: RTP/AVP;unicast;client_port=8006-8007
           Session: 12345678
           Switch-control=client-initiated-only

     C->S: SWITCHSETUP rtsp://foo/twister/audio3 RTSP/1.0
           CSeq: 6
           Transport: RTP/AVP;unicast;client_port=8008-8009
           Session: 12345678
           Switch-control=client-initiated-only

     C->S: SWITCHSETUP rtsp://foo/twister/video3 RTSP/1.0
           CSeq: 7
           Transport: RTP/AVP;unicast;client_port=8010-8011
           Session: 12345678
           Switch-control=client-initiated-only

     S->C: RTSP/1.0 200 OK
           CSeq: 3
           Transport: RTP/AVP;unicast;client_port=8002-8003;
                      server_port=9004-9005
           Session: 12345678





Gentric                                                      [page 25]

Internet Draft          RTSP Stream Switching           January 2004


     S->C: RTSP/1.0 200 OK
           CSeq: 4
           Transport: RTP/AVP;unicast;client_port=8004-8005;
                      server_port=9006-9007
           Session: 12345678

     S->C: RTSP/1.0 200 OK
           CSeq: 5
           Transport: RTP/AVP;unicast;client_port=8006-8007;
                      server_port=9008-9009
           Session: 12345678

     S->C: RTSP/1.0 200 OK
           CSeq: 6
           Transport: RTP/AVP;unicast;client_port=8008-8009;
                      server_port=9010-9011
           Session: 12345678

     S->C: RTSP/1.0 200 OK
           CSeq: 7
           Transport: RTP/AVP;unicast;client_port=8010-8011;
                      server_port=9012-9013
           Session: 12345678

   Then the client decides to start streaming the "default" 
   configuration at 220 kb/s (note that an non-agregate play would 
   also be possibility)
   
     C->S: PLAY rtsp://foo/twister RTSP/1.0
           CSeq: 8
           Range: npt=0-
           Session: 12345678

     S->C: RTSP/1.0 200 OK
           CSeq: 8
           Session: 12345678

   Then the client decides to switch streaming from 220 kb/s to 210 
   kb/s by switching audio streams

      C->S: SWITCH rtsp://foo/twister/audio1 RTSP/1.0
           CSeq: 9
           Session: 12345678
           Replace-with: rtsp://foo/twister/audio2

     S->C: RTSP/1.0 200 OK





Gentric                                                      [page 26]

Internet Draft          RTSP Stream Switching           January 2004

           CSeq: 9
           Session: 12345678
           
   Then the client decides to switch streaming from 210 kb/s to 60 
   kb/s by switching video streams

      C->S: SWITCH rtsp://foo/twister/video1 RTSP/1.0
           CSeq: 10
           Session: 12345678
           Replace-with: rtsp://foo/twister/video3

     S->C: RTSP/1.0 200 OK
           CSeq: 10
           Session: 12345678