Internet DRAFT - draft-helmy-pim-sm-implem
draft-helmy-pim-sm-implem
Network Working Group Ahmed Helmy
Internet Draft
Expire in six months
draft-helmy-pim-sm-implem-00.txt Jan 19, 1997
Protocol Independent Multicast-Sparse Mode (PIM-SM): Implementation
Document
Status of This Memo
This document is an Internet Draft. Internet Drafts are working
documents of the Internet Engineering Task Force (IETF), its Areas,
and its Working Groups. (Note that other groups may also distribute
working documents as Internet Drafts).
Internet Drafts are draft documents valid for a maximum of six
months. Internet Drafts may be updated, replaced, or obsoleted by
other documents at any time. It is not appropriate to use Internet
Drafts as reference material or to cite them other than as a
``working'' draft'' or ``work in progress.''
Please check the I-D abstract listing contained in each Internet
Draft directory to learn the current status of this or any other
Internet Draft.
Ahmed Helmy [Page 1]
Internet Draft PIM-SM Implementation Jan 1997
Abstract
This document describes the details of the PIM-SM [1,2,3] version 2
implementation for UNIX platforms; namely SunOS and SGI-IRIX. A
generic kernel model is adopted, which is protocol independent,
however some supporting functions are added to the kernel for
encapsulation of data packets at user level and decapsulation of PIM
Registers.
Further, the basic model for the user level, PIM daemon (pimd),
implementation is described.
Implementation details and code are included in supplementary
appendices.
1 Introduction
In order to support multicast routing protocols in a UNIX
environment, both the kernel and the daemon parts have to interact
and cooperate in performing the processing and forwarding functions.
The kernel basically handles forwarding the data packets according to
a multicast forwarding cache (MFC) stored in the kernel. While the
protocol specific daemon is responsible for processing the control
messages from other routers and from the kernel, and maintaining the
multicast forwarding cache from user level through traps and system
calls. The daemon takes care of all timers for the multicast routing
table (MRT) entries according to the specifications of the multicast
protocol; PIM-SMv2 [3]. The details of the implementation are
presented as follows. First, an overview of the system (kernel and
daemon) is given, with emphasis on the basic functions of each. Then,
a structural model is presented for the daemon, outlining the basic
building blocks for the multicast routing table and the virtual
interface table at user space. An illustrative functional description
is given, thereafter, for the daemon-kernel interface, and the
kernel. Finally, supplementary appendices provide more detailed
information about the implementation specifics [*]
_________________________
[*] The models discussed herein are merely illustra-
tive, and by no means are they exhaustive nor authori-
tative.
Ahmed Helmy [Page 2]
Internet Draft PIM-SM Implementation Jan 1997
2 System Overview
The PIM daemon processes all the PIM control messages and sets up an
appropriate kernel environment to deliver multicast data packets. The
kernel has to support multicast packets forwarding (see figure 1).
[Figures are present only in the postscript version]
Fig. 1 System overview
2.1 The kernel level
When the kernel receives an IP packet, it passes it through the IP
handling routine [ip-intr()]. ip-intr dispatches the packet to the
appropriate handling machinery, based on the destination address and
the IP protocol number. Here, we are only concerned with the
following cases:
* If the packet is a multicast packet then it passes through
the multicast forwarding machinery in the kernel [ip-
mforward()]. Subsequently, if there is a matching entry
(source and group addresses), with the right incoming
interface (iif), we get a `cache hit', and the packet is
forwarded to the corresponding outgoing interfaces (oifs)
through the fast forwarding path. Otherwise, if the packet
does not match the source, group, and iif, we get a `cache
miss', and an internal control message is passed up
accordingly to the daemon for processing.
* If the IP packet is a PIM packet (i.e. has protocol number
of IPPROTO-PIM), it passes through the PIM machinery in the
kernel [pim-input()], and in turn is passed up to the
socket queue using the raw-input() call.
* If the IP packet is an IGMP packet (i.e. has protocol
number of IPPROTO-IGMP), it passes through the IGMP
machinery in the kernel [igmp-input()], and in turn is
passed up to the socket queue using the raw-input() call.
2.2 The user level (daemon)
All PIM, IGMP and internal control (e.g. cache miss and wrong
incoming interface) messages are passed to PIM daemon; the
Ahmed Helmy [Page 3]
Internet Draft PIM-SM Implementation Jan 1997
daemon has the complete information to creat multicast routing
table (MRT). It also updates the multicast forwarding cache
(MFC) inside the kernel by using the `setsockopt()' system call,
to facilitate multicast packets forwarding (see figure 2).
[Figures are present only in the postscript version]
Fig. 2 IP-multicast implementation model
The PIM daemon listens to PIM and IGMP sockets and receives both
PIM and IGMP messages. The messages are processed by the
corresponding machinery in the daemon [accept-pim() and accept-
igmp(), respectively], and dispatched to the right component of
the protocol.
Modifications and updates are made to the multicast routing
table (MRT) according to the processing, and appropriate changes
are reflected in the kernel entries using the setsockopt()
system call .
Other messages may be triggered off of the state kept in the
daemon, to be processed by upstream/downstream routers.
3 User-level Implementation (The Multicast Routing Daemon)
The basic functional flow model for the PIM daemon is described
in this section (see figure 3). The user level implementation is
broken up into modules based on functionality, and includes
modules to handle the multicast routing table (MRT) [in mrt.c],
the virtual interface (vif) table [in vif.c], PIM IGMP messages
[in pim.c igmp.c], protocol specific routing functions [in
route.c], timers and housekeeping [in timer.c], kernel level
interface [in kern.c],etc.
[Figures are present only in the postscript version]
Fig. 3 Basic functional flow of the PIM daemon
Following is an explanation of these modules, in addition to the
data structures.
Ahmed Helmy [Page 4]
Internet Draft PIM-SM Implementation Jan 1997
3.1 Data Structures
There are two basic data structures, the multicast routing entry
(mrtentry) and virtual interface entry (vifentry). These
structures are created and modified in the daemon by receiving
the PIM control messages. The definitions of the data structures
are given in `pim.h' file.
3.1.1 Multicast Routing Table
The multicast routing entry is shown below :
Ahmed Helmy [Page 5]
Internet Draft PIM-SM Implementation Jan 1997
struct mrtentry{
struct srcentry *source; /* source */
struct grpentry *group; /* group */
vifbitmap-t outgoing; /* outgoing vifs to downstream */
vifbitmap-t vifmaskoff; /* deleted vifs */
struct mrtentry *srcnext; /* next entry of same source */
struct mrtentry *srcprev; /* prev entry of same source */
struct mrtentry *grpnext; /* next entry of same group */
struct nbrentry *upstream; /* upstream router, needed as in
* RPbit entry, the upstream
* router is different than
* the source upstream router.
*/
u-long pktrate-prev;/* packet count of prev check */
u-long idlecnt-prev;/* pkt cnt to check idle states */
u-int data-rate-timer;/* keep track of the data rate */
u-int reg-rate-timer;/* keep track of Register rate at
* RP
*/
u-int reg-cnt; /* keep track of the Register
* count at RP
*/
u-char *timers; /* vif timer list */
u-short timer; /* entry timer */
u-short join-prune-timer; /* periodic join/prune timer */
u-short flags; /* flags */
u-int assert-rpf-timer;/* Assert timer */
u-short registerBitTimer;/* Register-Suppression timer */
};
struct srcentry {
u-long source; /* subnet source of multicasts */
vifi-t incoming; /* incoming vif */
struct nbrentry *upstream; /* upstream router */
u-short timer; /* timer for recompute incoming */
u-long metric; /* Unicast Routing Metric for source */
u-long preference; /* The metric preference value */
struct mrtentry *mrtlink; /* link to routing entries */
struct srcentry *next; /* link to next entry */
};
struct grpentry {
u-long group; /* subnet group of multicasts */
vifbitmap-t leaves; /* outgoing vif to host */
u-char *timers; /* vif timer list */
struct mrtentry *mrtlink; /* link to routing entries */
struct grpentry *next; /* link to next entry */
Ahmed Helmy [Page 6]
Internet Draft PIM-SM Implementation Jan 1997
struct mrtentry *rp-entry; /* Pointer to the (*,G) entry */
struct rplist *active-rp; /* Pointer to the active RP */
};
The multicast routing table is the collection of all routing
entries, which are organized in a linked list in the daemon.
The overall structure of the multicast routing table in the
daemon is shown in figure 4.
[Figures are present only in the postscript version]
Fig. 4 The multicast routing table overall structure at user level
One of the frequently used fields in the mrtentry is the `flags'
field, where the values assigned to that field can be
one/combination of the following:
#define MRTF-SPT 0x0001 /* shortest path tree bit */
#define MRTF-WC 0x0002 /* wildcard bit */
#define MRTF-RP 0x0004 /* RP bit */
#define MRTF-CRT 0x0008 /* newly created */
#define MRTF-IIF-REGISTER 0x0020 /* iif = reg-vif */
#define MRTF-REGISTER 0x0080 /* oif includes reg-vif */
#define MRTF-KERNEL-CACHE 0x0200 /* kernel cache mirror */
#define MRTF-NULL-OIF 0x0400 /* null oif cache */
#define MRTF-REG-SUPP 0x0800 /* register suppress */
#define MRTF-ASSERTED 0x1000 /* RPF is an assert winner*/
#define MRTF-SG 0x2000 /* pure (S,G) entry */
3.1.2 Virtual Interface List
The virtual interface data structure is shown below :
Ahmed Helmy [Page 7]
Internet Draft PIM-SM Implementation Jan 1997
struct vifentry {
u-short flags; /* VIFF- flags */
u-char threshold; /* min ttl required */
u-long local; /* local interface address */
u-long remote; /* remote address */
u-long subnet; /* subnet number */
u-long subnetmask; /* subnet mask */
u-long broadcast; /* subnet broadcast addr */
char name[IFNAMSIZ]; /* interface name */
u-char timer; /* timer for sending queries */
u-char gq-timer; /* Group Query timer, used by DR*/
struct nbrentry *neighbors; /* list of neighboring routers */
u-int rate-limit; /* max rate */
};
The virtual interface table is the collection of all virtual
interface entries. They are organized as an array
(viflist[MAXVIFS]; MAXVIFS currently set to 32).
In addition to defining `mrtenry', `vifentry' and other data
structures, the `pim.h' file also defines all default timer
values.
3.2 Initialization and Set up
Most of the initialization calls and socket setup are handled
through main() [in main.c]. Basically, after the alarm and other
signals are initialized, PIM and IGMP socket handlers are setup,
then the function awaits on a `select' call. The timer interval
is set to TIMER-INTERVAL [currently 5 seconds], after which the
timer function is invoked, if no packets were detected by
`select'. The timer function basically calls the other timing
and housekeeping functions; age-vifs() and age-routes(). Also,
another alarm is scheduled for the following interval.
3.3 PIM and IGMP message handling
aragraphPIM All PIM control messages (in PIM-SMv2) have an IP
protocol number of IPPROTO-PIM (assigned to 103), and are not
part of IGMP like PIMv1 messages.
Incoming PIM messages are received on the pim socket, and are
dispatched by accept-pim() [in pim.c], according to their type.
PIM types are:
Ahmed Helmy [Page 8]
Internet Draft PIM-SM Implementation Jan 1997
PIM-HELLO 0
PIM-REGISTER 1
PIM-REGISTER-STOP 2
PIM-JOIN-PRUNE 3
PIM-BOOTSTRAP 4
PIM-ASSERT 5
PIM-GRAFT 6
PIM-GRAFT-ACK 7
PIM-CANDIDATE-RP-ADVERTISEMENT 8
Outgoing PIM messages are sent using send-pim() and send-pim-
unicast() [in pim.c].
aragraphIGMP
IGMP messages are dispatched using similar machinery to that of
PIM, only IGMP messages are received on the igmp socket,
dispatched by accept-igmp(), and are sent using send-igmp() [in
igmp.c].
3.4 MRT maintenance
The functions handling the MRT creation, access, query and
update are found in `mrt.c' file.
Major functions include route lookups; as in find-route(),
find-source(), and find-group().
The hash function and group to RP mapping are also performed by
functions in `mrt.c'.
3.5 Protocol Specific actions (Routing)
File `route.c' contains the protocol specific functions, and
processes the incoming/outgoing PIM messages.
Functions processing incoming PIM messages include accept-join-
prune(), accept-assert(), accept-register(), accept-register-
stop()..etc, and other supporting functions.
Functions triggering outgoing PIM messages include event-join-
prune(), send-register(), trigger-assert(), send-C-RP-
Adv()..etc, and supporting functions.
In addition, route.c also handles the internal control messages
through `process-kernelCall()', which dispatches the internal
Ahmed Helmy [Page 9]
Internet Draft PIM-SM Implementation Jan 1997
control messages according to their type.
Currently, there are three types of internal control messages:
IGMPMSG-NOCACHE 1 /* indicating a cache miss */
IGMPMSG-WRONGVIF 2 /* indicating wrong incoming interface */
IGMPMSG-WHOLEPKT 3 /* indicating whole data packet; used
* for registering
*/
These messages are dispatched to process-cacheMiss(), process-
wrongiif(), and process-wholepkt(), respectively.
3.6 Timing
The clock tick for the timing system is set to TIMER-INTERVAL
(currently 5 seconds). That means, that all periodic actions are
carried out over a 5 second granularity.
On every clock tick, the alarm interrupt calls the timer()
function in main.c, which, in turn, calls age-vifs() [in vif.c]
and age-routes() [in timer.c]. In this subsection, the functions
in `timer.c' are explained.
Basically, age-routes() browses through the routing
lists/tables, advancing all timers pertinent to the routing
tables. Specifically, the group list is traversed, and the leaf
membership information is aged by calling timeout-leaf(). Then a
for loop browses through the multicast route entries, aging the
outgoing interfaces [timeout-outgo()], and various related
timers (e.g. RegisterBitTimer, Assert timer..etc) for each
entry, as well as checking the rate thresholds for the Registers
(in the RP), and data (in the DRs). In addition, garbage
collection is performed on the timed out entries in the group
list, the source list and the MRT. The multicast forwarding
cache (MFC) in the kernel is also timed out, and deleted/updated
accordingly. Note that in this model, the MFC is passive, and
all timing is done in at user level, then communicated to the
kernel, see section 4. The periodic Join/Prune messaging is
performed per interface, by calling periodic-vif-join-prune().
Then the RPSetTimer is aged, and periodic C-RP-Adv messages are
sent if the router is a Candidate RP.
Ahmed Helmy [Page 10]
Internet Draft PIM-SM Implementation Jan 1997
3.7 Virtual Interface List
Functions in `vif.c' handle setting up the viflist array in both
the user level and the kernel, through `start-vifs()'. Special
care is given to the reg-vif; a dummy interface used for
encapsulating/decapsulating Registers, see section 7. The reg-
vif is installed by add-reg-vif(), and k-add-vif(reg-vif-num)
calls.
Other, per interface, tasks are carried out by vif.c; like
query-groups(), accept-group-report and query-neighbors(), in
addition to the periodic age-vifs() timing function.
3.8 Configuration
pimd looks for the configuration file at ``/etc/pimd.conf''.
Configuration parameters are parsed by `config.c'. Current PIM
specific configurable parameters include the register/data rate
thresholds, after which the RP/DRs switch to the SPT,
respectively. A candidate RP can be configured, with optional
interface address and C-RP-Adv period, and a candidate bootstrap
router can be configured with optional priority.
Following is an example `pimd.conf':
# Command formats:
#
# phyint <local-addr> [disable] [threshold <t>]
# candidate-rp <local-addr> [time <number>]
# bootstrap-router <local-addr> [priority <number>]
# switch-register-threshold [count <number> time <number>]
# switch-data-threshold [count <number> time <number>]
#
candidate-rp time 60
bootstrap-router priority 5
switch-register-threshold count 10 time 5
3.9 Interfacing to Unicast Routing
For proper implementation, PIM requires access to the unicast
routing tables. Given a specific destination address, PIM needs
at least information about the next hop (or the reverse path
forwarding `RPF' neighbor) and the interface (iif) to reach that
destination. Other information, are the metric used to reach the
destination, and the unicast routing protocol type, if such
information is attainable. In this document, only the RPF and
Ahmed Helmy [Page 11]
Internet Draft PIM-SM Implementation Jan 1997
iif information are discussed.
Two models have been employed to interface to unicast routing in
pimd.
The first model, which requires `routing socket' support, is
implemented on the `IRIX' platform, and other `routing socket'
supporting systems. This model requires no kernel modifications
to access the unicast routing tables. All the functionality
required is provided in `routesock.c'. In this document, we
adopt this model.
The second model, is implemented on `SunOs' and other platforms
not supporting routing sockets, and requires some kernel
modifications. In this model, an `ioctl' code is defined (e.g.
`SIOCGETRPF'), and a supporting function [e.g. get-rpf()] is
added to the multicast code in the kernel, to query the unicast
forwarding information base, through `rtalloc()' call.
Other models may also be adopted, depending on the operating
system capabilities.
In any case, the unicast routing information has to be updated
periodically to adapt to routing changes and network failures.
4 User level - Kernel Interfaces
Communication between the user level and the kernel is
established in both directions. Messages for kernel
initialization and setup, and adding, deleting and updating
entries from the viftable or the mfc in the kernel, are
triggered by the multicast daemon. While PIM, IGMP, and internal
control messages are passed from the kernel to user level.
4.1 User level to kernel messages
Most user level interfacing to the kernel is done through
functions in `kern.c'. Traps used are `setsockopt',
`getsockopt', and `ioctl'. Following is a brief description of
each:
* setsockopt(): used by the daemon to modify and update the
kernel environment; including the forwarding cache, the
viftable.. etc.
Options used with this call are:
Ahmed Helmy [Page 12]
Internet Draft PIM-SM Implementation Jan 1997
MRT-INIT initialization
MRT-DONE termination
MRT-ADD-VIF add a virtual interface
MRT-DEL-VIF delete a virtual interface
MRT-ADD-MFC add an entry to the multicast forwarding cache
MRT-DEL-MFC delete an entry from the multicast forwarding cache
MRT-PIM set a pim flag in the kernel [to stub the pim code]
* getsockopt(): used to get information from the kernel.
Options used with this call are:
MRT-VERSION get the version of the multicast kernel
MRT-PIM get the pim flag
* ioctl(): used by the daemon for 2 way communication with
the kernel.
Used to get interface information [in config.c and vif.c].
`kern.c' uses `ioctl' with option `SIOCGETSGCNT' to get the
cache hit packet count for an (S,G) entry in the kernel.
Also, ioctl may be used to get unicast routing information
from the kernel using the option `SIOCGETRPF', if such
model is used to get unicast routing information, see
section 3.9.
4.2 Kernel to user level messages
The kernel uses two calls to send PIM, IGMP and internal control
messages to user level:
* raw-input(): used by the kernel to send messages/packets to
the raw sockets at user level.
Used by both the pim machinery [pim-input()], and igmp
machinery [igmp-input()], in the kernel, to pass the
messages to the raw socket queue, and in turn to pim and
igmp sockets, to which the pim daemon listens.
Ahmed Helmy [Page 13]
Internet Draft PIM-SM Implementation Jan 1997
* socket-send(): used by the multicast forwarding machinery
to send internal control messages to the daemon.
Used by the multicast code in the kernel [in ip-mroute.c],
to send internal, multicast specific, control messages:
1 ip-mforward(), triggers an `IGMPMSG-NOCACHE' control
message, when getting a cache miss.
2 ip-mdq(), triggers an `IGMPMSG-WRONGVIF' control
message, when failing the RPF check (i.e. getting a
wrong iif).
3 register-send(), relays `IGMPMSG-WHOLEPKT' messages
containing the data packet, when called by ip-mdq() to
forward packets to the `reg-vif'.
5 IP Multicast Kernel Support
The kernel support for IP multicast is mostly provided through
`ip-mroute.c,h', providing the structure for the multicast
forwarding cache (MFC), the virtual interface table (viftable),
and supporting functions.
5.1 The Multicast Forwarding Cache
The Multicast Forwarding Cache (MFC) entry is defined in `ip-
mroute.h', and consists basically of the source address, group
address, an incoming interface (iif), and an outgoing interface
list (oiflist). Following is the complete definition:
struct mfc {
struct in-addr mfc-origin; /* ip origin of mcasts */
struct in-addr mfc-mcastgrp; /* multicast group associated*/
vifi-t mfc-parent; /* incoming vif */
u-char mfc-ttls[MAXVIFS]; /* forwarding ttls on vifs */
u-int mfc-pkt-cnt; /* pkt count for src-grp */
u-int mfc-byte-cnt; /* byte count for src-grp */
u-int mfc-wrong-if; /* wrong if for src-grp */
int mfc-expire; /* time to clean entry up */
};
Ahmed Helmy [Page 14]
Internet Draft PIM-SM Implementation Jan 1997
The multicast forwarding cache table (mfctable), is a hash table
of mfc entries defined as:
struct mbuf *mfctable[MFCTBLSIZ];
where MFCTBLSIZ is 256.
In case of hash collisions, a collision chain is constructed.
5.2 The Virtual Interface Table
The viftable is an array of `vif' structures, defined as
follows:
struct vif {
u-char v-flags; /* VIFF- flags defined above */
u-char v-threshold; /* min ttl required to forward on vif*/
u-int v-rate-limit; /* max rate */
struct tbf *v-tbf; /* token bucket structure at intf. */
struct in-addr v-lcl-addr; /* local interface address */
struct in-addr v-rmt-addr; /* remote address (tunnels only) */
struct ifnet *v-ifp; /* pointer to interface */
u-int v-pkt-in; /* # pkts in on interface */
u-int v-pkt-out; /* # pkts out on interface */
u-int v-bytes-in; /* # bytes in on interface */
u-int v-bytes-out; /* # bytes out on interface */
struct route v-route; /* Cached route if this is a tunnel */
#ifdef RSVP-ISI
u-int v-rsvp-on; /* # RSVP listening on this vif */
struct socket *v-rsvpd; /* # RSVPD daemon */
#endif /* RSVP-ISI */
};
One of the frequently used fields is the `v-flags' field, that
may take one of the following values:
VIFF-TUNNEL 0x1 /* vif represents a tunnel end-point */
VIFF-SRCRT 0x2 /* tunnel uses IP src routing */
VIFF-REGISTER 0x4 /* vif used for register encap/decap */
5.3 Kernel supporting functions
The major standard IP multicast supporting functions are:
Ahmed Helmy [Page 15]
Internet Draft PIM-SM Implementation Jan 1997
* ip-mrouter-init()
Initialize the `ip-mrouter' socket, and the MFC.
Called by setsockopt() with option MRT-INIT.
* ip-mrouter-done()
Disable multicast routing.
Called by setsockopt() with option MRT-DONE.
* add-vif()
Add a new virtual interface to the viftable.
Called by setsockopt() with option MRT-ADD-VIF.
* del-vif()
Delete a virtual interface from the viftable.
Called by setsockopt() with option MRT-DEL-VIF.
* add-mfc()
Add/update an mfc entry to the mfctable.
Called by setsockopt() with the option MRT-ADD-MFC.
* del-mfc()
Delete an mfc entry from the mfctable.
Called by setsockopt() with the option MRT-DEL-MFC.
* ip-mforward()
Receive an IP multicast packet from interface `ifp'. If it
matches with a multicast forwarding cache, then pass it to
the next packet forwarding routine [ip-mdq()]. Otherwise,
if the packet does not match on an entry, then create an
'idle' cache entry, enqueue the packet to it, and send the
header in an internal control message to the daemon [using
socket-send()], indicating a cache miss.
* ip-mdq()
Ahmed Helmy [Page 16]
Internet Draft PIM-SM Implementation Jan 1997
The multicast packet forwarding routine. An incoming
interface check is performed; the iif in the entry is
compared to that over which the packet was received. If
they match, the packet if forwarded on all vifs according
to the ttl array included in the mfc [this basically
constitutes the oif list]. Tunnels and Registers are
handled by this function, by forwarding to `dummy' vifs. If
the iif check does not pass, an internal control message
(basically the packet header) is sent to the daemon [using
socket-send()], including vif information, and indicating
wrong incoming interface.
* expire-upcalls()
Clean up cache entries if upcalls are not serviced.
Called by the Slow Timeout mechanism, every half second.
The following functions in the kernel provide support to PIM,
and are part of `ip-mroute.c':
* register-send()
Send the whole packet in an internal control message,
indicating a whole packet, for encapsulation at user level.
Called by ip-mdq().
* pim-input()
The PIM receiving machinery in the kernel. Check the
incoming PIM control messages and passes them to the daemon
using raw-input(). If the PIM message is a Register
message, it is processed; the packet is decapsulated and
passed to register-mforward(), and header of the Register
message is passed up to the pim socket using raw-input().
Called by ip-intr() based on IPPROTO-PIM.
* register-mforward()
Forward a packet resulting from register decapsulation.
This is performed by looping back a copy of the packet
Ahmed Helmy [Page 17]
Internet Draft PIM-SM Implementation Jan 1997
using looutput(), such that the packet is enqueued on the
`reg-vif' queue and fed back into the multicast forwarding
machinery.
Called by pim-input().
6 Appendix I
The user level code, kernel patches, and change description, are
available in,
http://catarina.usc.edu/ahelmy/pimsm-implem/
or through anonymous ftp from,
catarina.usc.edu:/pub/ahelmy/pimsm-implem/
7 Appendix II: Register Models
The sender model, in PIM-SM, is based on the sender's DR
registering to the active RP for the corresponding group.
Such process involves encapsulating data packets in PIM-
REGISTER messages. Register encapsulation requires
information about the RP, and is done at the user level
daemon. Added functionality to the kernel, is necessary to
pull up the data packet to user level for encapsulation.
Register decapsulation (at the RP), on the other hand, is
performed in the kernel, as the decapsulated packet has the
original source in the IP header, and most operating
systems do not allow such packet to be forwarded from user
level carrying a non-local address (spoofing).
The kernel is modified to have a pim-input() machinery to
receive PIM packets. If the PIM type is REGISTER, the
packet is decapsulated. The decapsulated packet is then
looped back and treated as a normal multicast packet.
The two models discussed above are further detailed in this
section.
Ahmed Helmy [Page 18]
Internet Draft PIM-SM Implementation Jan 1997
7.1 Register Encapsulation
Upon receiving a multicast data packet from a directly
connected source, a DR [initially having no (S,G) cache]
looks up the entry in the kernel cache. When the look-up
machinery gets a cache miss, the following actions take
place (see figure 5):
[Figures are present only in the postscript version]
Fig. 5 At the DR: Creating (S,G) entries for local senders and
1 an idle (S,G) cache entry is created in the kernel, with
oif = null,
2 the data packet is enqueued to this idle entry [a threshold
of 4 packets queue length is currently enforced],
3 an expiry timer is started for the idle queue, and
4 an internal control packet is sent on the socket queue
using socket-send(), containing the packet header and
information about the incoming interface, and the cache
miss code.
[Note that the above procedures occur for the first packet
only, when the cache is cold. Further packets will be either
enqueued (if the cache is idle and the queue is not full),
dropped (if the cache is idle and the queue is full), or
forwarded (if the cache entry is active).]
At user space, the igmp processing machinery receives this
packet, the internal control protocol is identified and the
message is passed to the proper function to handle the
kernelCalls [process-kernelCall()].
The cache miss code is checked, then the router checks to see:
1 if the sender of the packet is a directly connected source,
Ahmed Helmy [Page 19]
Internet Draft PIM-SM Implementation Jan 1997
and
2 if the router is the DR on the receiving interface.
If the check does not pass, no action pertaining to Registers is
taken [*] If the daemon does not activate the idle kernel
cache, the cache eventually times out, and the enqueued packets
are dropped.
If the check passes, the daemon creates an (S,G) entry with the
REGISTER bit set in the `flags' field, the iif set the interface
on which the packet was received, and the reg-vif included in
the oiflist, in addition to any other oifs copied from wild card
entries according to the PIM spec. `reg-vif' is an added `dummy
interface' for use in the register models. Further, the daemon
installs this entry in the kernel cache; using setsockopt() with
the `ADD-MFC' option.
This triggers the add-mfc() function in the kernel, which in
turn calls the forwarding machinery [ip-mdq()]. The forwarding
machinery iterates on the virtual interfaces, and if the vif is
the reg-vif, then the register-send() function is called. The
latter function, is the added function for encapsulation
support, which sends the enqueued packets as WHOLE-PKTs (in an
internal control message) to the user level using socket-send().
The message flows through the igmp, and process-kernelCall
machineries, then the [(S,G) REGISTER] is matched, and the
packet is encapsulated and unicast to the active RP of the
corresponding group.
Subsequent packets, match on (S,G) (with oif=reg-vif) in the
kernel, and get sent to user space directly using register-
send().
7.2 Register Decapsulation
At the RP, the unicast Registers, by virtue of being PIM
messages, are received by the pim machinery in the kernel [pim-
_________________________
[*] Other checks are performed according to the longest
match rules in the PIM spec. Optionally, if no entry is
matched, a kernel cache with oif = null may be in-
stalled, to avoid further cache misses on the same en-
try.
Ahmed Helmy [Page 20]
Internet Draft PIM-SM Implementation Jan 1997
input()]. The PIM type is checked. If REGISTER, pim-input()
checks the null register bit, if not set, the packet is passed
to register-mforward(), which loops it back on the `reg-vif'
queue using looutput(). In any case, the header of the Register
(containing the original IP header, the PIM message header and
the IP header of the inner encapsulated packet) is passed to
raw-input(), and in turn to pim socket, to which the PIM daemon
listens (see figure 6).
[Figures are present only in the postscript version]
Fig. 6 At the RP, receiving Registers, decapsulating and forwarding
At the PIM daemon, the message is processed by the pim machinery
[accept-pim()]. REGISTER type directs the message to the
accept-register() function. The Register message is parsed, and
processed according to PIM-SM rules given in the spec.
If the Register is to be forwarded, the daemon performs the
following:
1 creates (S,G) entry, with iif=reg-vif, and the oiflist is
copied from wild card entries [the data packets are to be
forwarded down the unpruned shared tree, according to PIM-
SM longest match rules], and
2 installs the entry in the kernel cache, using
setsockopt(ADDMFC)
At the same time, the decapsulated packet enqueued at the reg-
vif queue is fed into ip-intr() [the IP interrupt service
routine], and passed to ip-mforward() as a native multicast data
packet. A cache lookup is performed on the decapsulated packet.
If the cache hits and the iif matches (i.e. cache iif = reg-
vif), the packet is forwarded according to the installed
oiflist. Otherwise, a cache miss internal control message is
sent to user level, and processed accordingly.
Note that, a race condition may occur, where the decapsulated
packet reaches ip-mforward() before the daemon installs the
kernel cache. This case is handled in process-cacheMiss(), in
conformance with the PIM spec, and the packet is forwarded
accordingly.
Ahmed Helmy [Page 21]
Internet Draft PIM-SM Implementation Jan 1997
8 Acknowledgments
Special thanks to Deborah Estrin (USC/ISI), Van Jacobson (LBL),
Bill Fenner, Stephen Deering (Xerox PARC), Dino Farinacci (Cisco
Systems) and David Thaler (UMich), for providing comments and
hints for the implementation. An earlier version of PIM version
1.0, was written by Charley Liu and Puneet Sharma at USC.
A. Helmy did much of this work as a summer intern at Silicon
Graphics Inc.
PIM was supported by grants from the National Science Foundation
and Sun Microsystems.
Ahmed Helmy [Page 22]
Internet Draft PIM-SM Implementation Jan 1997
References
[1] D. Estrin, D. Farinacci, A. Helmy, D. Thaler,
S. Deering, M. Handley, V. Jacobson, C. Liu, P. Sharma, L. Wei.
Protocol Independent Multicast-Sparse Mode (PIM-SM): Motivation and
Architecture.
Experimental RFC, Dec 1996.
[2] S. Deering, D. Estrin, D. Farinacci, V. Jacobson, C. Liu, L. Wei, P.
Sharma, and A. Helmy.
Protocol Independent Multicast (PIM): Specification.
Internet Draft, June 95.
[3] D. Estrin, D. Farinacci, A. Helmy, D. Thaler,
S. Deering, M. Handley, V. Jacobson, C. Liu, P. Sharma, L. Wei.
Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol Specification
Experimental RFC, Dec 1996.
Address of Author:
Ahmed Helmy
Computer Science Dept/ISI
University of Southern Calif.
Los Angeles, CA 90089
ahelmy@usc.edu