Frequently Asked Questions | SIP Architecture

Q: I am used to seeing users that follow the scheme SIP:, but I’ve also seen them with the scheme SIPS: What’s the difference?
A: SIP uses Universal Resource Identifiers (URIs) for identifying users. A URI identifies resources on the Internet, and those used by SIP incorporate phone numbers or names in the username. At the beginning of this is SIP:, which indicates the protocol being used. This is similar to Web site addresses, which begin with HTTP: to indicate the protocol to use when accessing the site. When SIP: is at the beginning of the address, the transmission is not encrypted. Those beginning with SIPS: require encryption for the session.
Q: Why do all responses to a request in SIP begin with the numbers 1 through 6?
A: This indicates the category to which the response belongs. There are six categories of responses that may be returned from a request: Informational, Success, Redirection, Client Error, Server Error, and Global Failure.
Q: I received a response that my request was met with a server error. Does this mean I can’t use this feature of my VoIP program?
A: Not necessarily. When a request receives a Server Error response, it means that the server it was sent to met with the error. The request could still be forwarded to other servers. A Global Error meanns that it wouldn’t be forwarded because every other server would also have the same error.I need to use a different computer for VoIP. The software is the same as the one on my computer, but I’m concerned that others won’t be able to see that I’m online because I’m using a different machine.When you start the program and log onto your VoIP account, SIP makes a REGISTER request that provides your SIP address and IP address to a Registrar server. This allows multiple people to use multiple computers. No matter what your location, SIP allows others to find you with this mapping of your SIP-address to the current IP address.
Q: Should I always use encryption to protect the data that I’m transmitting over the Internet?
A: Unless you expect to be discussing information or transferring files that require privacy, it shouldn’t matter whether your transmission is encrypted or not. After all, if someone did eavesdrop on an average conversation, would you really care that they heard your opinion on the last movie you watched? If, however, you were concerned that the content of your conversation or other data that was transmitted might be viewed by a third party, then encryption would be a viable solution to protecting your interests. As of this writing however, there are no interoperable, nonproprietary implementations of SIP that use encrypted signaling and media, so you will need to refer to the documentation of the application(s) being used to determine if this is available.


SIMPLE | SIP Architecture

SIMPLE is an extension of SIP, which is used for maintaining presence information and managing the messages that are exchanged between the participants using instant messaging. Just as SIP registers users with a SIP server before they can begin a session, SIMPLE registers presence information. When a user registers through SIMPLE, those with this user in their Buddy List can access information that the user is online. When the people who have the user in their lists are alerted that the user is online, they can initiate a chat. If the user needs to do some work and changes their status to busy, or goes away from their desk and changes their status to being away, then this information is updated in the IM applications that have this person as a contact. Generally, the presence of a user is indicated in these programs through icons that change based on the user’s status.
Because SIMPLE is an extension of SIP, it has the same features and methods of routing messages. The users are registered, and then send text-based requests to initiate a session. The messages are sent between user agents as individual requests between User agent clients and User agent servers. Because the messages are small, they can move between the two User agents quickly with minimal time lag even during peak Internet hours.
Although the IETF IM and Presence Protocol Working Group are still developing SIMPLE as a standard, it has been implemented by a number of IM applications. Windows XP was the first operating system to include SIMPLE, and is used by Microsoft Windows Messenger, and numerous other IM applications also are using SIMPLE as a standardized method for instant messaging.


Instant Messaging | SIP Architecture

In different variations, instant messaging has been around longer than the Internet has been popular. In the 1970s, the TALK command was implemented on UNIX machines, which invoked a split screen that allowed users of the system to see the messages they typed in individual screens. In the 1980s, Bulletin Board Systems (BBSes) became popular, where people would use a modem to dial into another person’s computer to access various resources, such as message boards, games, and file downloads. On BBSes, the system operator (SYSOP) could invoke a chat feature that allowed the SYSOP to send messages back and forth with the caller on a similar split-screen. If the BBS had multiple phone lines, then the callers could Instant message with each other while they were online. As the Internet gained popularity, the ability to exchange messages with other users became a feature that was desired and expected.
Today there are a large number of IM applications that can be used to exchange text messages over the Internet and other IP networks. Although this is nowhere near a complete list, some of the more popular ones include:
  • AIM, America Online Instant Messenger
  • ICQ
  • Yahoo Messenger
  • MSN Messenger
In addition to these, there are also applications that allow communication using VoIP or other multimedia that also provide the ability to communicate using text messages. As seen in Figure 1, Skype provides a chat feature that allows two or more users to communicate in a private chat room. Each message between the participants appears on a different line, indicating who submitted which line of text and optionally the time that each message was sent. This allows participants to scroll back in the conversation to identify previously mentioned statements or topics of discussion. Although the figure depicts instant messaging in Skype, it is a common format that is used in modern IM software.

Figure 1: Instant Messaging through Skype
One of the important features of any IM application is the ability to keep a contact list of those with whom you routinely communicate. In many programs the contact list is also known as a Buddy List. However, even with this listing, it would be impossible to contact anyone if you didn’t know when each contact was available. If a person had a high-speed connection and was always connected to the Internet, then they might always appear online. As such, they would need a way of indicating that they were online but not available, or whether the person was available for one form of communication but not another. The ability to display each contact’s availability in a Buddy List when someone opens an IM application is called presence.


Understanding SIP’s Architecture

Let’s look at how they work together to provide communication between two endpoints on a system. In doing so, we can see how the various elements come together to allow single and multimedia to be exchanged over a local network or the Internet.

The User agents begin by communicating with various servers to find other User agents to exchange data with. Until they can establish a session with one another, they must work in a client/server architecture, and make requests of servers and wait for these requests to be serviced. Once a session is established between the User agents, the architecture changes. Because a User agent can act as either a client or a server in a session with another User agent, these components are part of what is called a peer-to-peer (P2P) architecture. In this architecture, the computers are equal to one another, and both make and service requests made by other machines. To understand how this occurs, let’s look at several actions that a User agent may make to establish such a session with another machine.

SIP Registration

Before a User agent can even make a request to start communication with another client, each participant must register with a Registrar server. As seen in Figure 1, the User agent sends a REGISTER request to the SIP server in the Registrar role. Once the request is accepted, the Registrar adds the SIP-address and IP address that the User agent provides to the location service. The location service can then use this information to provide SIP-address to IP-address mappings for name resolution.

Figure 1: Registering with a SIP Registrar

Requests through Proxy Servers

When a Proxy Server is used, requests and responses from user agents initially are made through the Proxy server. As seen in Figure 2, User Agent A is attempting to invite User Agent B into a session. User Agent A begins by sending an INVITE request to User Agent B through a Proxy server, which checks with the location service to determine the IP address of the client being invited. The Proxy server then passes this request to User Agent B, who answers the request by sending its response back to the Proxy server, who in turn passes this response back to User Agent A. During this time, the two User agents and the Proxy server exchange these requests and responses using SDP. However, once these steps have been completed and the Proxy server sends acknowledgements to both clients, a session can be created between the two User agents. At this point, the two User agents can use RTP to transfer media between them and communicate directly.

Figure 2: Request and Response Made through Proxy Server

Requests through Redirect Servers

When a Redirect server is used, a request is made to the Redirect server, which returns the IP address of the User agent being contacted. As seen in Figure 3, User Agent A sends an INVITE request for User Agent B to the Redirect server, which checks the location service for the IP address of the client being invited. The Redirect server then returns this information to User Agent A. Now that User Agent A has this information, it can now contact User Agent B directly. The INVITE request is now sent to User Agent B, which responds directly to User Agent A. Until this point, SDP is used to exchange information. If the invitation is accepted, then the two User agents would begin communicating and exchanging media using RTP.

Figure 3: Request Made through Redirect Server

Peer to Peer

Once the user agents have completed registering themselves, and making requests and receiving responses on the location of the user agent they wish to contact, the architecture changes from one of client/server to that of peer-to-peer (P2P). In a P2P architecture, user agents act as both clients who request resources, and servers that respond to those requests and provide resources. Because resources aren’t located on a single machine or a small group of machines acting as network servers, this type of network is also referred to as being decentralized.
When a network is decentralized P2P, it doesn’t rely on costly servers to provide resources. Each computer in the network is used to provide resources, meaning that if one becomes unavailable, the ability to access files or send messages to others in the network is unaffected. For example, if one person’s computer at an advertising firm crashed, you could use SIP to communicate with another person at that company, and talk to them and have files transferred to you. If one computer goes down, there are always others that can be accessed and the network remains stable.
In the same way, when user agents have initiated a session with one another, they become User agent clients and User agent servers to one another, and have the ability to invite additional participants into the session. As seen in Figure 6.5, each of these User agents can communicate with one another in an audio or videoconference. If one of these participants ends the session, or is using a device that fails during the communication, the other participants can continue as if nothing happened. This architecture makes communication between User agents stable, without having to worry about the network failing if one computer or device suddenly becomes unavailable.

Figure 4: Once SIP Has Initiated a Session, a Peer-to-Peer Architecture Is Used


Protocols Used with SIP | SIP Architecture

Although SIP is a protocol in itself, it still needs to work with different protocols at different stages of communication to pass data between servers, devices, and participants. Without the use of these protocols, communication and the transport of certain types of media would either be impossible or insecure. In the sections that follow, we’ll discuss a number of the common protocols that are used with SIP, and the functions they provide during a session.


The User Datagram Protocol (UDP) is part of the TCP/IP suite of protocols, and is used to transport units of data called datagrams over an IP network. It is similar to the Transmission Control Protocol (TCP), except that it doesn’t divide messages into packets and reassembles them at the end. Because the datagrams don’t support sequencing of the packets as the data arrives at the endpoint, it is up to the application to ensure that the data has arrived in the right order and has arrived completely. This may sound less beneficial than using TCP for transporting data, but it makes UDP faster because there is less processing of data. It often is used when messages with small amounts of data (which requires less reassembling) are being sent across the network, or with data that will be unaffected overall by a few units of missing data.
Although an application may have features that ensure that datagrams haven’t gone missing or arrived out of order, many simply accept the potential of data loss, duplication, or errors. In the case of Voice over IP, streaming video, or interactive games, a minor loss of data or error will be a minor glitch that generally won’t affect the overall quality or performance. In these cases, it is more important that the data is passed quickly from one endpoint to another. If reliability were a major issue, then the use of TCP as a transport protocol would be a better choice over hindering the application with features that check for the reliability of the data it receives.

Transport Layer Security

Transport Layer Security (TLS) is a protocol that can be used with other protocols like UDP to provide security between applications communicating over an IP network. TLS uses encryption to ensure privacy, so that other parties can’t eavesdrop or tamper with the messages being sent. Using TLS, a secure connection is established by authenticating the client and server, or User Agent Client and User Agent Server, and then encrypting the connection between them.
Transport Layer Security is a successor to Secure Sockets Layer (SSL), which was developed by Netscape. Even though it is based on SSL 3.0, TLS is a standard that has been defined in RFC 2246, and is designed to be its replacement. In this standard, TLS is designed as a multilayer protocol that consists of:
  • TLS Handshake Protocol
  • TLS Record Protocol
The TLS Handshake Protocol is used to authenticate the participants of the communication and negotiate an encryption algorithm. This allows the client and server to agree upon an encryption method and prove who they are using cryptographic keys before any data is sent between them. Once this has been done successfully, a secure channel is established between them.
After the TLS Handshake Protocol is used, the TLS Record Protocol ensures that the data exchanged between the parties isn’t altered en route. This protocol can be used with or without encryption, but TLS Record Protocol provides enhanced security using encryption methods like the Data Encryption Standard (DES). In doing so, it provides the security of ensuring data isn’t modified, and others can’t access the data while in transit.
The Transport Layer Security Protocol isn’t a requirement for using SIP, and generally isn’t needed for standard communications. For example, if you’re using VoIP or other communication software to trade recipes or talk about movies with a friend, then using encryption might be overkill. However, in the case of companies that use VoIP for business calls or to exchange information that requires privacy, then using TLS is a viable solution for ensuring that information and data files exchanged over the Internet are secure.

Other Protocols Used by SIP

As mentioned, SIP does not provide the functionality required for sending single-media or multimedia across a network, or many of the services that are found in communications programs. Instead, it is a component that works with other protocols to transport data, control streaming media, and access various services like caller-ID or connecting to the Public Switched Telephone Network (PSTN). These protocols include:
  • Session Description Protocol, which sends information to effectively transmit data
  • Real-Time Transport Protocol, which is used to transport data
  • Media Gateway Control Protocol, which is used to connect to the PSTN
  • Real-time Streaming Protocol, which controls the delivery of streaming media
The Session Description Protocol (SDP) and Real-time Transport Protocol (RTP) are protocols that commonly are used by SIP during a session. SDP is required to send information needed during a session where multimedia is exchanged between user agents, and RTP is to transport this data. The Media Gateway Control Protocol (MGCP) and Real-time Streaming Protocol (RTSP) commonly are used by systems that support SIP, and are discussed later for that reason.

Session Description Protocol
The Session Description Protocol (SDP) is used to send description information that is necessary when sending multimedia data across the network. During the initiation of a session, SDP provides information on what multimedia a user agent is requesting to be used, and other information that is necessary in setting up the transfer of this data.
SDP is a text-based protocol that provides information in messages that are sent in UDP packets. The text information sent in these packets is the session description, and contains such information as:
  • The name and purpose of the session
  • The time that the session is active
  • A description of the media exchanged during the session
  • Connection information (such as addresses, phone number, etc.) required to receive media
    SDP is a standard that was designed by the IETF under RFC 2327.

Real-Time Transport Protocol
The Real-Time Transport Protocol (RTP) is used to transport real-time data across a network. It manages the transmission of multimedia over an IP network, such as when it is used for audio communication or videoconferencing with SIP. Information in the header of the packets sent over RTP tells the receiving user agent how the data should be reconstructed and also provides information on the codec bit streams.
Although RTP runs on top of UDP, which doesn’t ensure reliability of data, RTP does provide some reliability in the data sent between user agents. The protocol uses the Real-time Control Protocol to monitor the delivery of data that’s sent between participants. This allows the user agent receiving the data to detect if there is packet loss, and allows it to compensate for any delays that might occur as data is transported across the network.
RTP was designed by the IETF Audio-Video Transport Working Group, and originally was specified as a standard under RFC 1889. Since then, this RFC has become obsolete, but RTP remains a standard and is defined under RFC 3550. In RFC 2509, Compressed Real-time Transport Protocol (CRTP) was specified as a standard, allowing the data sent between participants to be compressed, so that the size was smaller and data could be transferred quicker. However, since CRTP doesn’t function well in situations without reliable, fast connections, RTP is still commonly used for communications like VoIP applications.

Media Gateway Control Protocol
The Media Gateway Control Protocol (MGCP) is used to control gateways that provide access to the Public Switched Telephone Network (PSTN), and vice versa. In doing so, this protocol provides a method for communication on a network to go out onto a normal telephone system, and for communications from the PSTN to reach computers and other devices on IP networks. A media gateway is used to convert the data from a format that’s used on PSTN to one that’s used by IP networks that use packets to transport data; MGCP is used to set up, manage, and tear down the calls between these endpoints.
MGCP was defined in RFC 2705 as an Internet standard by the IETF. However, the Media Gateway Control Protocol is also known as H.248 and Megaco. The IETF defined Megaco as a standard in RFC 3015, and the Telecommunication Standardization Sector of the International Telecommunications Union endorsed the standard as Recommendation H.248.

Real-Time Streaming Protocol
The Real-Time Streaming Protocol (RTSP) is used to control the delivery of streaming media across the network. RTSP provides the ability to control streaming media much as you would control video running on a VCR or DVD player. Through this protocol, an application can issue commands to play, pause, or perform other actions that effect the playing of media being transferred to the application.
IETF defined RTSP as a standard in RFC 2326, allowing clients to control streaming media sent to them over protocols like RTP.


SIP Requests and Responses | SIP Architecture

Because SIP is a text-based protocol like HTTP, it is used to send information between clients and servers, and User Agent clients and User Agent servers, as a series of requests and responses. When requests are made, there are a number of possible signaling commands that might be used:
  • REGISTER Used when a user agent first goes online and registers their SIP address and IP address with a Registrar server.
  • INVITE Used to invite another User agent to communicate, and then establish a SIP session between them.
  • ACK Used to accept a session and confirm reliable message exchanges.
  • OPTIONS Used to obtain information on the capabilities of another user agent, so that a session can be established between them. When this information is provided a session isn’t automatically created as a result.
  • SUBSCRIBE Used to request updated presence information on another user agent’s status. This is used to acquire updated information on whether a User agent is online, busy, offline, and so on.
  • NOTIFY Used to send updated information on a User agent’s current status. This sends presence information on whether a User agent is online, busy, offline, and so on.
  • CANCEL Used to cancel a pending request without terminating the session.
  • BYE Used to terminate the session. Either the user agent who initiated the session, or the one being called can use the BYE command at any time to terminate the session.
When a request is made to a SIP server or another user agent, one of a number of possible responses may be sent back. These responses are grouped into six different categories, with a three-digit numerical response code that begins with a number relating to one of these categories. The various categories and their response code prefixes are as follows:
  • Informational (1xx) The request has been received and is being processed.
  • Success (2xx) The request was acknowledged and accepted.
  • Redirection (3xx) The request can’t be completed and additional steps are required (such as redirecting the user agent to another IP address).
  • Client error (4xx) The request contained errors, so the server can’t process the request
  • Server error (5xx) The request was received, but the server can’t process it. Errors of this type refer to the server itself, and they don’t indicate that another server won’t be able to process the request.
  • Global failure (6xx) The request was received and the server is unable to process it. Errors of this type refer to errors that would occur on any server, so the request wouldn’t be forwarded to another server for processing.
There are a wide variety of responses that apply to each of the categories. The different responses, their categories, and codes are shown in Table 1.
Table 1: Listing of Responses, Response Codes, and Their Meanings 
Response Code
Response Category
Response Description
Call is being forwarded
Multiple choices
Moved permanently
Moved temporarily
See other
Use proxy
Alternative service
Client Error
Bad request
Client Error
Client Error
Payment required
Client Error
Client Error
Not found
Client Error
Method not allowed
Client Error
Not acceptable
Client Error
Proxy authentication required
Client Error
Request timeout
Client Error
Client Error
Client Error
Length required
Client Error
Request entity too large
Client Error
Request-URI too large
Client Error
Unsupported media type
Client Error
Bad extension
Client Error
Temporarily not available
Client Error
Call leg/transaction does not exist
Client Error
Loop detected
Client Error
Too many hops
Client Error
Address incomplete
Client Error
Client Error
Busy here
Server Error
Internal server error
Server Error
Not implemented
Server Error
Bad gateway
Server Error
Service unavailable
Server Error
Gateway time-out
Server Error
SIP version not supported
Global Failures
Busy everywhere
Global Failures
Global Failures
Does not exist anywhere
Global Failures
Not acceptable
Related Posts with Thumbnails

Link Exchange