SIP Call Process | VoIP Standards and Specifications

SIP Addressing

The “objects” addressed by SIP are users at hosts, identified by a SIP URL. The SIP URL takes a form similar to that of an e-mail address: user@host. The user part is a user name or a telephone number. The host part is a domain name or a numeric network address.

Locating a SIP Server

When a client wishes to send a request, the client sends it to a locally configured SIP proxy server (as in HTTP), independent of the request-URL, or to the IP address and port corresponding to the Request-URL. For the latter case, the client must determine the protocol, port, and IP address of the server receiving the request.

SIP Transaction

Once the host part has been resolved to a SIP server, the client sends one or more SIP requests to that server and receives one or more responses from the server. A request (and its retransmissions) and the responses triggered by that request make up a SIP transaction.

SIP Invitation

A successful SIP invitation consists of two requests: INVITE, followed by ACK. The INVITE request asks the callee to join a particular conference or establish a two-party conversation. After the callee has agreed to participate in the call, the caller confirms that it has received that response by sending an ACK request. If the caller no longer wants to participate in the call, it sends a BYE request instead of an ACK.

SIP itself only defines the initiation of a session. All other parts of the session are covered by the other parts of the aforementioned protocol, some of which come from other applications or functions not necessarily designed for real-time multimedia over IP. Compared with H.323, SIP is less defined and more open, which can result in interworking difficulties because of different implementations of the standard. Every SIP developer can implement a unique version with different extensions that aren’t included in the basic standard. Although H.323 and SIP handle call set-up, call control, and media in different ways, H.323 defines all of these processes, whereas SIP defines call set-up only, and uses other protocols for call control and media. Using SIP, call control and set up are handled separately from media. This becomes an important issue when interworking with the PSTN, which uses SS7 for signaling, although SS7 can be translated to SIP through a gateway or softswitch. Otherwise, intelligent networking services such as caller ID and call forwarding will not work with SIP. Media and signaling are handled sepa- rately in SIP, requiring separate media gateway and signaling gateways for interoperability with the PSTN. This can create a major problem in the case of common PSTN services like DTMF or touch tones, in which signaling is carried in the media. There is also no SIP equivalent of ISUP message transport from SS7.


SIP | VoIP Standards and Specifications

The IETF developed SIP in reaction to the ITU-T H.323 recommendations. The IETF believed H.323 was inadequate for evolving IP telephony requirements because its command structure was too complex and its architecture was too centralized and monolithic. SIP is an application layer control protocol that can establish, modify, and terminate multimedia sessions or calls. SIP transparently supports name mapping and redirection services, allowing the implementation of ISDN and Intelligent Network telephony subscriber services. The early implementations of SIP have been in network carrier IP-Centrex trials. IP-PBX manufacturers are in the process of developing SIP-based versions of their current product offerings.

SIP was designed as part of the overall IETF multimedia data and control architecture that supports protocols such as Resource Reservation Protocol (RSVP), RTP, Real-Time Streaming Protocol (RTSP), Session Announcement Protocol (SAP), and Session Description Protocol (SDP). Figure 1 shows SIP and its associated protocols.

Figure 1: SIP signaling protocols.

SIP provides the necessary protocol mechanisms to support the following basic functions:

  • Name translation and user location—Determination of the end system to be used for communication

  • Feature negotiation—Allows station users involved in a call to agree on the features supported, recognizing that not all features are available to all station users

  • Call participation management—During a call, a station user can conference other station users into the call or cancel connections to conferenced parties; station users can also be transferred or placed on hold

  • Call feature changes—A station user should be able to change the call characteristics during the course of the call; new features may be enabled based on call requirements or new conferenced station users

The two major components in a SIP network are User Agents and Network Servers. A User Agent Client (UAC) initiates SIP requests, and a User Agent Server (UAS) receives SIP requests and return responses on user behalf. A Registration Server receives updates regarding the current user location, and a Proxy Server receives and forwards requests to a next-hop server, which has more information regarding called party location. A Redirect Server receives requests, determines next-hop server, and returns an address to client.

SIP request messages consist of three elements: Request Line, Header, and Message Body. SIP response messages consist of three elements: Status Line, Header, and Message Body.

Figure 2 shows the basic steps for a SIP call set-up.

Figure 2: SIP call setup.


Real-Time Transport Control Protocol

The RTCP is based on the periodic transmission of control packets to all participants in the session, with the same distribution mechanism as that for the data packets. The underlying protocol must provide multiplexing of the data and control packets, for example, separate port numbers with UDP.

The format of the header is shown in Figure 1.

Figure 1: Format of the header.

Version: Identifies the RTP version, which is the same in RTCP packets and RTP data packets. Version 2 is defined by this specification. P (padding): When set, this RTCP packet contains some additional padding octets at the end, which are not part of the control information. The last octet of the padding is a count of how many padding octets should be ignored. Padding may be needed by some encryption algorithms with fixed block sizes. In a compound RTCP packet, padding should be required only on the last individual packet because the compound packet is encrypted as a whole. Reception report count: The number of reception report blocks contained in this packet. A value of zero is valid. Packet type: Contains the constant 200 to identify this as an RTCP SR packet. Length: The length of this RTCP packet in 32-bit words minus one, including the header and any padding. (The offset of one makes zero a valid length and avoids a possible infinite loop in scanning a compound RTCP packet, and counting 32-bit words avoids a validity check for a multiple of four.)

Figure 2 shows the complete packet header for IP, UDP, and RTP. The headers of the three payload-carrying protocols are sent sequentially before the digitized voice samples, which are actually the payload of the RTP header. The result is a 40-octet overhead for every information data packet.

Figure 2: Packet header for IP, UDP, and RTP.

Figure 3 shows an H.323 call setup between two H.323 terminals. The gatekeeper server in the diagram could represent an IP-PBX call telephony server if it were an IP-PBX system, and the H.323 terminals could just as well be IP telephones. The gatekeeper and H.323 terminals reside on a LAN. The first steps in the call set-up process are terminal registration and admission with the gatekeeper. The calling terminal establishes a TCP signaling connection with the called terminal and receives a connection acknowledgment. Bandwidth requirements and management are controlled by TCP-based H.245 signaling. UDP voice packets are transmitted across the LAN between the terminals under the control of RTP and RTCP protocols.

Figure 3: H.323 protocol and call setup.

It shows the precise control messages that are exchanged between terminals from call set-up to call termination. The originating terminal (1) initiates a call to the destination terminal (2) directly, without any intermediate gateway or gatekeeper. H.225 and H.235 messages are indicated. Some messages overlap each other (Messages 4/5 and 9/10). H.225 messages are Messages 1, 2, 3, and 12. 12; the remaining messages are H.245.


Real-Time Transport Protocol | VoIP Standards and Specifications

The RTP provides end-to-end network transport functions suitable for applications transmitting real-time audio or video packets over multicast or unicast network services. It was developed by the IETF and is used with the H.323’s recommended H.225 protocols to provide reliable communications. RTP by itself does not address resource reservation and does not guarantee QoS for real-time services. The packet transport is supplemented by a control protocol (RTCP) that monitors data delivery in a manner scalable to large multicast networks and provides minimal control and identification functionality. RTP and RTCP are designed to be independent of the underlying transport and network layers. The protocol supports the use of RTP-level translators and mixers. The following are the elements of RTP.

RTP Payload

The media payload is transported by RTP in a packet, such as audio samples or compressed video data. The payload format and interpretation are beyond the scope of this document.

RTP Packet

A packet consists of the fixed RTP header, a possibly empty list of contributing sources (see below), and the payload data. Some underlying protocols may require an encapsulation of the RTP packet to be defined. Typically, one packet of the underlying protocol contains a single RTP packet, but several RTP packets may be contained, if permitted by the encapsulation method.

RTCP Packet

A control packet consists of a fixed header part similar to that of RTP packets, followed by structured elements that depend on the RTCP packet type. Typically, multiple RTCP packets are sent together as a compound RTCP packet in a single packet of the underlying protocol; this is enabled by the length field in the fixed header of each RTCP packet. Figure 1 shows the RTP header.

Figure 1: The RTP header.

The RTP fixed header fields have certain functions. V (version): Identifies the RTP version. P (padding): When set, the packet contains one or more additional padding octets at the end that are not part of the payload. X (extension bit): When set, the fixed header is followed by exactly one header extension with a defined format. CSRC count: Contains the number of CSRC identifiers that follow the fixed header. M (marker): The interpretation of the marker is defined by a profile. It is intended to allow significant events such as frame boundaries to be marked in the packet stream. Payload type: Identifies the format of the RTP payload and determines its interpretation by the application. A profile specifies a default static mapping of payload type codes to payload formats. Additional payload type codes may be defined dynamically through non-RTP means. Sequence number: Increments by one for each RTP data packet sent and may be used by the receiver to detect packet loss and restore packet sequence. Timestamp: Reflects the sampling instant of the first octet in the RTP data packet. The sampling instant must be derived from a clock that increments monotonically and linearly in time to allow synchronization and jitter calculations. The resolution of the clock must be sufficient for the desired synchronization accuracy and for measuring packet arrival jitter (one tick per video frame is typically not sufficient). SSRC (synchronization source): Identifies the synchronization source. This identifier is chosen randomly, with the intent that no two synchronization sources within the same RTP session will have the same SSRC identifier. CSRC (contributing source): Contributing source identifiers list. Identifies the contributing sources for the payload contained in this packet.


The Need for RTP/RTCP

The IP is a relatively low-level protocol. It was originally developed for delivery of packets (or datagrams) between host computers across the ARPAnet (Internet) packet network. For an IP telephony application, datagrams are transmitted between desktop voice terminals. IP is a connectionless protocol that does not establish a virtual connection through a network before commencing transmission. Establishing a communications path between endpoints is the responsibility of higherlevel protocols.

IP makes no guarantees concerning reliability, flow control, error detection, or error correction. As a result, datagrams could arrive at the destination computer out of sequence, with errors, or not even arrive at all. This is known as jitter. IP does succeed in making the network transparent to the upper layers involved in voice transmission through an IPbased network.

VoIP transmission, by definition, uses IP, although it is not well suited for voice transmission. Real-time applications such as voice and video require guaranteed connection with consistent delay characteristics. Higher-layer protocols address these issues. There are two available protocols at the transport layer when transmitting information through an IP network. These are Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). Both protocols enable the transmission of information between the correct processes (or applications) on network endpoints. These processes are associated with unique port numbers.

TCP is a connection-oriented protocol. It establishes a communications path before transmitting data. It handles sequencing and error detection, thus ensuring that a reliable stream of data is received by the destination application. TCP can address real-time voice applications to a certain extent but would require higher-layer functions. Voice applications require that information is received in the correct sequence, reliably, and with predictable delay characteristics. With this in mind, the ITU-T decided that the alternative protocol, UDP, should be used. UDP is also a connectionless protocol. UDP routes data to its correct destination port but does not attempt to perform any sequencing or ensure data reliability.

To provide feedback on the quality of the transmission link, the RTP/RFTP protocols, developed by the IETF, are used. Real-Time Transport Protocol (RTP) transports the digitized samples of real-time information, and Real-Time Control Protocol (RTCP) provides the mechanism for quality feedback. RTP and RTCP do not reduce the overall delay of the real-time information. Nor do they make any guarantees concerning QoS.

When an IP voice terminal transmits datagrams across the LAN/WAN, the IP, UDP, and RTP headers are followed by the data payload of the RTP header. The data payload is comprised of digitized voice samples. The length of these samples can vary, but for voice samples representing 20 ms are considered the maximum duration for the payload. The number of transmitted datagrams varies indirectly with the sampling rate—the longer the sampling period, the fewer the number of packets transmitted per second. The selection of the payload duration is a compromise between bandwidth requirements and quality. Smaller payloads demand higher bandwidth per channel band because the header length remains at 40 octets. However, if payloads are increased, the overall delay of the system will increase, and the system will be more susceptible to the loss of individual packets by the network.


H.245 Media Control

H.323 requires that endpoints negotiate compatible settings before audio, video, and/or data communication links can be established. H.245 uses control messages and commands that are exchanged during the call to inform and instruct. The implementation of H.245 control is mandatory in all endpoints. H.245 provides the following media control functionalities:

Capability Exchange

H.323 allows endpoints to have different receive and send capabilities. Each endpoint records its receiving and sending capabilities (media types, codecs, bit rates, etc.) in a message and sends it to the other endpoint(s).

Opening and Closing of Logical Channels

H.323 audio and video logical channels are unidirectional end-to-end links (or multipoint links in the case of multipoint conferencing). Data channels are bidirectional. A separate channel is needed for audio, video, and data communications. H.245 messages control the opening and closing of such channels. H.245 control messages use logical channel 0, which is always open.

Flow Control Messages

These messages provide feedback to the endpoints when communication problems are encountered.

Other Commands and Messages

Several other commands and messages may be used during a call, such as a command to set the codec at the receiving endpoint when the sending endpoint switches its codec. H.245 control messages may also be routed through a gatekeeper if one exists.

Figure 1 shows H.245 signaling. After establishing a control signaling link between two gateways, media bandwidth is negotiated. After the two terminals agree, with acknowledgments, an open link is established for the call.

Figure 1: H.245 signaling.


H.323 Control and Signaling Mechanisms & H.225.0 RAS

The flow of information in an H.323-enabled network consists of a mix of audio, video, data, and control packets. Control information is essential for call set-up and tear-down, capability exchange and negotiation, and administrative purposes. H.323 uses three control protocols: H.245 media control, H.225/Q.931 call signaling, and H.225.0 registration, admission, and status (RAS). The Q.931 protocol was originally developed for ISDN control signaling, and is currently used for inter-PBX networks implementing Qsig standards (see Networking chapter).

Figure 1 shows the H.323 protocol stacks for control and signaling processes.

Figure 1: H.323 system: protocol architecture.

The diagram follows the ISO OSI seven-layer model. H.323-specific protocols are above the transport layer (Layer 4). Real-Time Transport Protocol (RTP)/RTCP, RAS, H.225, and H.245 span across the fifth and sixth layers. Q.931, sometimes included as part of the H.225 protocol set, is at the terminal applications layer (Layer 7). Data communications are supported by multiple T.120 protocols and use TCP/IP trans- mission protocols, which are different from the H.323 protocols that support real-time audio and video communications requirements

H.225.0 RAS

H.225.0 RAS messages define communications between endpoints and a gatekeeper. H.225.0 RAS is only required when a gatekeeper exists. Unlike H.225.0 call signaling and H.245, H.225.0 RAS uses unreliable transport for delivery. RTP is used to guarantee delivery transport.

Gatekeeper Discovery

Gatekeeper discovery is used by endpoints to find their gatekeeper. An endpoint needing to find the transport address of its gatekeeper(s) will multicast a gatekeeper request (GRQ) message. One or more gatekeepers may reply with a GCF message containing the gatekeeper transport address.

Endpoint Registration

Once a gatekeeper exists, all endpoints must be registered with it. This is necessary because gatekeepers need to know the aliases and transport addresses of all endpoints in its zone to route calls.

Endpoint Location

Gatekeepers use this message to locate endpoints with a specific transport address. This process is required, for example, when the gatekeeper updates its alias transport address database.

Other Communications

A gatekeeper performs many other management and control duties such as admission control, status determination, and bandwidth management, which are all handled through H.225.0 RAS messages.

Figure 2 shows RAS. The terminal sends a request to the gatekeeper for registration and admission. The endpoints acknowledge and confirm the requests. When the call is completed, the terminal notifies the gatekeeper of the call status and receives confirmation that the request to disconnect has been received.

Figure 2: RAS.
Related Posts with Thumbnails

Link Exchange