SIP Functions and Features

When SIP was developed, it was designed to support five specific elements of setting up and tearing down communication sessions. These supported facets of the protocol are:
  • User location, where the endpoint of a session can be identified and found, so that a session can be established
  • User availability, where the participant that’s being called has the opportunity and ability to indicate whether he or she wishes to engage in the communication
  • User capabilities, where the media that will be used in the communication is established, and the parameters of that media are agreed upon
  • Session setup, where the parameters of the session are negotiated and established
  • Session management, where the parameters of the session are modified, data is transferred, services are invoked, and the session is terminated
Although these are only a few of the issues needed to connect parties together so they can communicate, they are important ones that SIP is designed to address. However, beyond these functions, SIP uses other protocols to perform tasks necessary that allow participants to communicate with each other.

User Location

The ability to find the location of a user requires being able to translate a participant’s username to their current IP address of the computer being used. The reason this is so important is because the user may be using different computers, or (if DHCP is used) may have different IP addresses to identify the computer on the network. The program can use SIP to register the user with a server, providing a username and IP address to the server. Because a server now knows the current location of the user, other users can now find that user on the network. Requests are redirected through the proxy server to the user’s current location. By going through the server, other potential participants in a communication can find the user, and establish a session after acquiring their IP address.

User Availability

The user availability function of SIP allows a user to control whether he or she can be contacted. Users can set themselves as being away or busy, or available for certain types of communication. If available, other users can then invite the user to join in a type of communication (e.g., voice or videoconference), depending on the capabilities of the program being used.

User Capabilities

Determining the user’s capabilities involves determining what features are available on the programs being used by each of the parties, and then negotiating which can be used during the session. Because SIP can be used with different programs on different platforms, and can be used to establish a variety of single-media and multimedia communications, the type of communication and its parameters needs to be determined. For example, if you were to call a particular user, your computer might support video conferencing, but the person you’re calling doesn’t have a camera installed. Determining the user capabilities allows the participants to agree on which features, media types, and parameters will be used during a session.

Session Setup

Session setup is where the participants of the communication connect together. The user who is contacted to participate in a conversation will have their program “ring” or produce some other notification, and has the option of accepting or rejecting the communication. If accepted, the parameters of the session are agreed upon and established, and the two endpoints will have a session started, allowing them to communicate.

Session Management

Session management is the final function of SIP, and is used for modifying the session as it is in use. During the session, data will be transferred between the participants, and the types of media used may change. For example, during a voice conversation, the participants may decide to invoke other services available through the program, and change to a video conferencing. During communication, they may also decide to add or drop other participants, place a call on hold, have the call transferred, and finally terminate the session by ending their conversation. These are all aspects of session management, which are performed through SIP.


Because SIP was based on existing standards that had already been proven on the Internet, it uses established methods for identifying and connecting endpoints together. This is particularly seen in the addressing scheme that it uses to identify different SIP accounts. SIP uses addresses that are similar to e-mail addresses. The hierarchical URI shows the domain where a user’s account is located, and a host name or phone number that serves as the user’s account. For example, SIP: shows that the account myaccount is located at the domain Using this method makes it simple to connect someone to a particular phone number or username.
Because the addresses of those using SIP follow a username@domainname format, the usernames created for accounts must be unique within the namespace. Usernames and phone numbers must be unique as they identify which account belongs to a specific person, and used when someone attempts sending a message or placing a call to someone else. Because the usernames are stored on centralized servers, the server can determine whether a particular username is available or not when a person initially sets up an account.
URIs also can contain other information that allows it to connect to a particular user, such as a port number, password, or other parameters. In addition to this, although SIP URIs will generally begin with SIP:, others will begin with SIPS:, which indicates that the information must be sent over a secure transmission. In such cases, the data and messages transmitted are transported using the Transport Layer Security (TLS) protocol


Understanding SIP Architecture

SIP was designed to initiate interactive sessions on an IP network. Programs that provide real-time communication between participants can use SIP to set up, modify, and terminate a connection between two or more computers, allowing them to interact and exchange data. The programs that can use SIP include instant messaging, voice over IP (VoIP), video teleconferencing, virtual reality, multiplayer games, and other applications that employ single-media or multimedia. SIP doesn’t provide all the functions that enable these programs to communicate, but it is an important component that facilitates communication between two or more endpoints.
You could compare SIP to a telephone switchboard operator, who uses other technology to connect you to another party, set up conference calls or other operations on your behalf, and disconnect you when you’re done. SIP is a type of signaling protocol that is responsible for sending commands to start and stop transmissions or other operations used by a program. The commands sent between computers are codes that do such things as open a connection to make a phone call over the Internet or disconnect that call later on. SIP supports additional functions, such as call waiting, call transfer, and conference calling, by sending out the necessary signals to enable and disable these functions. Just as the telephone operator isn’t concerned with how communication occurs, SIP works with a number of components and can run on top of several different transport protocols to transfer media between the participants.

Overview of SIP

One of the major reasons that SIP is necessary is found in the nature of programs that involve messaging, voice communication, and exchange of other media. The people who use these programs may change locations and use different computers, have several usernames or accounts, or communicate using a combination of voice, text, or other media (requiring different protocols). This creates a situation that’s similar to trying to mail a letter to someone who has several aliases, speaks different languages, and could change addresses at any particular moment.
SIP works with various network components to identify and locate these endpoints. Information is passed through proxy servers, which are used to register and route requests to the user’s location, invite another user(s) into a session, and make other requests to connect these endpoints. Because there are a number of different protocols available that may be used to transfer voice, text, or other media, SIP runs on top of other protocols that transport data and perform other functions. By working with other components of the network, data can be exchanged between these user agents regardless of where they are at any given point.
It is the simplicity of SIP that makes it so versatile. SIP is an ASCII- or text-based protocol, similar to HTTP or SMTP, which makes it more lightweight and flexible than other signaling protocols (such as H.323). Like HTTP and SMTP, SIP is a request-response protocol, meaning that it makes a request of a server, and awaits a response. Once it has established a session, other protocols handle such tasks as negotiating the type of media to be exchanged, and transporting it between the endpoints. The reusing of existing protocols and their functions means that fewer resources are used, and minimizes the complexity of SIP. By keeping the functionality of SIP simple, it allows SIP to work with a wider variety of applications.
The similarities to HTTP and SMTP are no accident. SIP was modeled after these text-based protocols, which work in conjunction with other protocols to perform specific tasks. SIP is also similar to these other protocols in that it uses Universal Resource Identifiers (URIs) for identifying users. A URI identifies resources on the Internet, just as a Uniform Resource Locator (URL) is used to identify Web sites. The URI used by SIP incorporates a phone number or name, such as, which makes reading SIP addresses easier. Rather than reinventing the wheel, the development of SIP incorporated familiar aspects of existing protocols that have long been used on IP networks. The modular design allows SIP to be easily incorporated into Internet and network applications, and its similarities to other protocols make it easier to use.

RFC 2543 / RFC 3261

The Session Initiation Protocol is a standard that was developed by the Internet Engineering Task Force (IETF). The IETF is a body of network designers, researchers, and vendors that are members of the Internet Society Architecture Board for the purpose of developing Internet communication standards. The standards they create are important because they establish consistent methods and functionality. Unlike proprietary technology, which may or may not work outside of a specific program, standardization allows a protocol or other technology to function the same way in any application or environment. In other words, because SIP is a standard, it can work on any system, regardless of the communication program, operating system, or infrastructure of the IP network.
The way that IETF develops a standard is through recommendations for rules that are made through Request for Comments (RFCs). The RFC starts as a draft that is examined by members of a Working Group, and during the review process, it is developed into a finalized document. The first proposed standard for SIP was produced in 1999 as RFC 2543, but in 2002, the standard was further defined in RFC 3261. Additional documents outlining extensions and specific issues related to the SIP standard have also been released, which make RFC 2543 obsolete and update RFC 3261. The reason for these changes is that as technology changes, the development of SIP also evolves. The IETF continues developing SIP and its extensions as new products are introduced and its applications expand.
Reviewing RFCs can provide you with additional insight and information, answering specific questions you may have about SIP. The RFCs related to SIP can be reviewed by visiting the IETF Web site at Additional materials related to the Session Initiation Protocol Working Group also can be found

SIP and Mbone

Although RFC 2543 and RFC 3261 define SIP as a protocol for setting up, managing, and tearing down sessions, the original version of SIP had no mechanism for tearing down sessions and was designed for the Multicast Backbone (Mbone). Mbone originated as a method of broadcasting audio and video over the Internet. The Mbone is a broadcast channel that is overlaid on the Internet, and allowed a method of providing Internet broadcasts of things like IETF meetings, space shuttle launches, live concerts, and other meetings, seminars, and events. The ability to communicate with several hosts simultaneously needed a way of inviting users into sessions; the Session Invitation Protocol (as it was originally called) was developed in 1996.
The Session Invitation Protocol was a precursor to SIP that was defined by the IETF MMUSIC Working group, and a primitive version of the Session Initiation Protocol used today. However, as VoIP and other methods of communications became more popular, SIP evolved into the Session Initiation Protocol. With added features like the ability to tear down a session, it was a still more lightweight than more complex protocols like H.323. In 1999, the Session Initiation Protocol was defined as RFC 2543, and has become a vital part of multimedia applications used today.


In designing the SIP standard, the IETF mapped the protocol to the OSI (Open Systems Interconnect) reference model. The OSI reference model is used to associate protocols to different layers, showing their function in transferring and receiving data across a network, and their relation to other existing protocols. A protocol at one layer uses only the functions of the layer below it, while exporting the information it processes to the layer above it. It is a conceptual model that originated to promote interoperability, so that a protocol or element of a network developed by one vendor would work with others.
As seen in Figure 1, the OSI model contains seven layers: Application, Presentation, Session, Transport, Network, Data Link, and Physical. As seen in this figure, network communication starts at the Application layer and works its way down through the layers step by step to the Physical layer. The information then passes along the cable to the receiving computer, which starts the information at the Physical layer. From there it steps back up the OSI layers to the Application layer where the receiving computer finalizes the processing and sends back an acknowledgement if needed. Then the whole process starts over.

Figure 1: In the OSI Reference Model, Data is Transmitted down through the Layers, across the Medium, and Back up through the Layers
The layers of the OSI reference model have different functions that are necessary in transferring data across a network, and mapping protocols to these layers make it easier to understand how they interrelate to the network as a whole. Table 1 shows the seven layers of the OSI model, and briefly explains their functions.
Table 1: Layers of the OSI Model 
7: Application
The Application layer is used to identify communication partners, facilitate authentication (if necessary), and allows a program to communicate with lower layer protocols, so that in turn it can communicate across the network. Protocols that map to this layer include SIP, HTTP, and SMTP.
6: Presentation
The Presentation layer converts data from one format to another, such as converting a stream of text into a pop-up window, and handles encoding and encryption.
5: Session
The Session layer is responsible for coordinating sessions and connections.
4: Transport
The Transport layer is used to transparently transfer data between computers. Protocols that map to this layer include TCP, UDP, and RTP.
3: Network
The Network Layer is used to route and forward data so that it goes to the proper destination. The most common protocol that maps to this layer is IP.
2: Data Link
The Data Link layer is used to provide error correction that may occur at the physical level, and provide physical addressing through the use of MAC addresses that are hard-coded into network cards.
1: Physical
The Physical layer defines electrical and physical specifications of network devices, and provides the means of allowing hardware to send and receive data on a particular type of media. At this level, data is passed as a bit stream across the network.

SIP and the Application Layer
Because SIP is the Session Initiation Protocol, and its purpose is to establish, modify, and terminate sessions, it would seem at face-value that this protocol maps to the Session layer of the OSI reference model. However, it is important to remember that the protocols at each layer interact only with the layers above and below it. Programs directly access the functions and supported features available through SIP, disassociating it from this layer. SIP is used to invite a user into an interactive session, and can also invite additional participants into existing sessions, such as conference calls or chats. It allows media to be added to or removed from a session, provides the ability to identify and locate a user, and also supports name mapping, redirection, and other services. When comparing these features to the OSI model, it becomes apparent that SIP is actually an Application-layer protocol.
The Application layer is used to identify communication partners, facilitate authentication (if necessary), and allows a program to communicate with lower layer protocols, so that in turn it can communicate across the network. In the case of SIP, it is setting up, maintaining, and ending interactive sessions, and providing a method of locating and inviting participants into these sessions. The software being used communicates through SIP, which passes the data down to lower layer protocols and sends it across the network.


SIP Architecture

As the Internet became more popular in the 1990s, network programs that allowed communication with other Internet users also became more common. Over the years, a need was seen for a standard protocol that could allow participants in a chat, videoconference, interactive gaming, or other media to initiate user sessions with one another. In other words, a standard set of rules and services was needed that defined how computers would connect to one another so that they could share media and communicate. The Session Initiation Protocol (SIP) was developed to set up, maintain, and tear down these sessions between computers.
By working in conjunction with a variety of other protocols and specialized servers, SIP provides a number of important functions that are necessary in allowing communications between participants. SIP provides methods of sharing the location and availability of users and explains the capabilities of the software or device being used. SIP then makes it possible to set up and manage the session between the parties. Without these tasks being performed, communication over a large network like the Internet would be impossible. It would be like a message in a bottle being thrown in the ocean; you would have no way of knowing how to reach someone directly or whether the person even could receive the message.
Beyond communicating with voice and video, SIP has also been extended to support instant messaging and is becoming a popular choice that’s incorporated in many of the instant messaging applications being produced. This extension, called SIMPLE, provides the means of setting up a session in much the same way as SIP. SIMPLE also provides information on the status of users, showing whether they are online, busy, or in some other state of presence. Because SIP is being used in these various methods of communications, it has become a widely used and important component of today’s communications.


Frequently Asked Questions | H.323 Architecture

Q: I’ve never heard of H.323. What applications do I use that rely on this?

A: Microsoft Netmeeting for one. Polycom and Tandberg videoconferencing clients are another.

Q: Do H.323 terminals have to explicitly send the H.225 call setup messages to the IP address of the gateway?
A: Yes, an H.323 endpoint must know the transport address—for example, the IP address and port number—for the Q.931 dialogue. Q.931 then provides the transport address for the H.245 control channel. This is how addresses are bootstrapped in H.323.

Q: In what layer of ISO you can put H.323 standard?
A: H.323 doesn’t map to just one layer, but is primarily implemented at layers 3 and 4.

Q: I’ve heard that H.323 uses more than one TCP/UDP port in order to transmit voice, video, and data. Are these ports fixed, or do they vary for each connection?
A: H.323 uses several ports and both TCP and UDP to signal and transport voice. H.225/Q.931 and H.245 use TCP and H.225/RAS and RTP/RTCP use UDP. Ports 1718–1720 are dedicated to H.323 traffic.

Q: Several dynamic port combinations are used per session as well.What is the best VoIP codec?
A: There are a number of factors to make that kind of determination. Probably most important is the nature of the network between the two ends. If you are connected of a LAN (high bandwidth, minimal delays, etc.), then G.711 generally provides the best voice quality.

Q: What’s an Application Layer Gateway?
A: ALGs peer more deeply into the packet than packet filtering firewalls but normally do not scan the entire payload. Unlike packet filtering or stateful inspection firewalls, ALGs do not route packets; rather the ALG accepts a connection on one network interface and establishes the cognate connection on another network interface. An ALG provides intermediary services for hosts that reside on different networks, while maintaining complete details of the TCP connection state and sequencing.

Q: What’s better, H.323 or SIP?What’s better, an apple or an orange?
A: Seriously, H.323 is based on SS7 and was designed to internetwork efficiently with the PSTN. SIP is based on HTTP and was not designed with interconnecting to the PSTN in mind. So, major carriers tend to use H.323 because it translates ISDN and SS7 signaling to H.323 VoIP signaling easily. SIP does not. On the other hand, SIP supports IM, is text-based, and is implemented more cheaply than H.323.


H.235 Security Mechanisms

H.235 is expected to operate in conjunction with other H-series protocols that utilize H.245 as their control protocol and/or use the H.225.0 RAS and/or Call Signaling Protocol. H.235’s major premise is that the principal security threat to communications is assumed to be eavesdropping on the network, or some other method of diverting media streams. The security issues related to DoS attacks are not addressed.
This family of threats relies on the absence of cryptographic assurance of a request’s originator. Attacks in this category seek to compromise the message integrity of a conversation. This threat demonstrates the need for security services that enable entities to authenticate the originators of requests and to verify that the contents of the message and control streams have not been altered in transit.
Authentication is, in general, based either on using a shared secret (you are authenticated properly if you know the secret) or on public key-based methods with certifications (you prove your identity by possessing the correct private key). The basis for authentication (trust) and privacy is defined by the endpoints of the communications channel. For a connection establishment channel, this may be between the caller (such as a gateway or IP telephone endpoint) and a hosting network component (a gateway or gatekeeper). For example, a telephone “trusts” that the gatekeeper will connect it with the telephone whose number has been dialed. The result of trusting an element is the confidence to reveal the privacy mechanism (algorithm and key) to that element. Given the aforementioned information, all participants in the communications path should authenticate any and all trusted elements.
Encryption methods are defined as DES, 3DES, and AES. TLS (Transport Layer Security) and IPSec (IP Security) are recommended to secure layer 4 and layer 3 protocol messages, respectively. IPsec and TLS provide solutions at different levels of the ISO model—IPSec in the Network Layer, and TLS in the Transport Layer. Both use the same type of negotiation to set up tunnels, but IPSec often encrypts crucial header information, and TLS encrypts only the application payload of packet, thus TLS encryption retains IP addressing.
The scope of the H.235 specification is shown in Figure 1. H.235 addresses the protocols that are shaded in gray.

Figure 1: H.235 Scope
Let’s look at how the H.235 specification interacts with each protocol.
  • H.245 The call signaling channel may be secured using TLS. Users may be authenticated either during the initial call connection, in the process of securing the H.245 channel, and/or by exchanging certificates on the H.245 channel. Media encryption details often are negotiated in private control channels determined by information carried in the OpenLogicalChannel connection.
  • H.225.0/Q.931 Q.931 can be secured via transport-layer security (TLS) or IPSec prior to any H.225.0 message exchange.
  • H.225.0/RAS During the RAS phase of registering, the endpoint and the gatekeeper can exchange security policies and capabilities to define the security methods to be used in the initiated call session.
  • RTP/RTCP H.245 signaling messages are used to provide confidentiality for a secured RTP channel. The method uses H.245 capability exchange for opening secured logical channels as part of the H.245 capability exchange phase, DES, 3DES or AES. The security capability is exchanged per media stream (RTP channel). The security mechanisms protect media streams and any control channels to operate in a completely independent manner.
H.235 specifies a number of security profiles. You can think of each security profile as a module consisting of a set of terms, definitions, requirements, procedures, and a profile overview that describe a particular instantiation of security methods. Security profiles, which are optional, may be implemented either selectively or in almost any combination. Endpoints may initially offer multiple security profiles simultaneously using the aforementioned RRQ/GRQ messages. H.235 also explicitly defines particular combinations of profiles that are useful or possible. For example, H.323 shows that the baseline security profile can be combined with SP4–Direct and selective routed call security, SP6–Voice encryption profile with native H.235/H.245 key management, and SP9–Security gateway support for H.323.
Profiles can be differentiated by the spectrum of security services each particular profile supports. The following security services are defined: Authentication, Nonrepudiation, Integrity, Confidentiality, Access Control, and Key Management. For example, the baseline security profile supports the security services shown in Figure 2.

Figure 2: Baseline Security Profile Security Services (H.235.1)
You can see that this profile provides for authentication and integrity of the signaling streams but does not provide support for encryption, nonrepudiation, or access control of these streams. The baseline security profile (H.235.1) specifies the following: Authentication and integrity protection, or authentication-only for H.225/RAS, H.225/Q.931 messages, and tunneled H.245 messages using password-based protection. The security profile is applicable to communications between H.323 terminal to gatekeeper, gatekeeper to gatekeeper, and H.323 gateway to gatekeeper.
The following Security Profiles are defined:
  • 235.1 Baseline security profile
  • 235.2 Signature security profile
  • 235.3 Hybrid security profile
  • 235.4 Direct and selective routed call security
  • 235.5 Framework for secure authentication in RAS using weak shared secrets
  • 235.6 Voice encryption profile with native H.235/H.245 key management
  • 235.7 Usage of the MIKEY key management protocol for the Secure Real Time Transport Protocol
  • 235.8 Key exchange for SRTP using secure signaling channels
  • 235.9 Security gateway support for H.323
Each security profile defines security services in the context of the generic classes of attacks that can be prevented by implementing that particular profile. In the case of the baseline security profile, the following attacks are thwarted.
  • Man-in-the-middle attacks Application level hop-by-hop message authentication and integrity protects against such attacks when the man in the middle is between an application level hop.
  • Replay attacks Use of time stamps and sequence numbers prevent such attacks.
  • Spoofing User authentication prevents such attacks.
  • Connection hijacking Use of authentication/integrity for each signaling message prevents such attacks.
Other threats are not addressed in this profile. For example, the issue of confidentiality via encryption is left to other security profiles. Thus, any H.323 system that uses only this profile will be subject to attacks that rely upon data interception by sniffing traffic. If however, the endpoints that specify the security profiles available to the system indicate that they support SP6–Voice encryption profile with native H.235/H.245 key management, as well as the baseline security profile, then the threat posed by eavesdropping attacks will be minimized.
The matrix describing the security services provided by security profile H.235.6 is shown in Figure 3.

Figure 3: Voice Encryption Profile with Native H.235/H.245 Key Management
In Figure 3 you can see that the addition of security profile H.235.6 to the baseline security profile adds methods for Diffie-Hellman key management and encryption of the media streams. In this fashion, security profiles can be added to the H.323 entities within your environment so as to provide only the security controls dictated by your security requirements. This approach allows some customization of the H.323 security controls so that, for example, they can be configured to work with your particular existing firewall infrastructure. 


H.245 Call Control Messages

After a connection has been set up via the call signaling procedure, H.245 messages (there are many of these) are used to resolve the call media type, to exchange terminal capabilities, and to establish the media flow before the call can be established. H.245 also manages call parameters after call establishment. H.245 messages also are encoded in ASN.1 PER syntax. The messages carried include notification of terminal capabilities, and commands to open and close logical channels. The H.245 control channel is permanently open, unlike the media channels.
Table 1 lists various types of messages and the H.323 ports used to transport them.
Table 1: H.323 Ports 
H.245 messages
Dynamically assigned ports
RTP messages
Dynamically assigned ports
UDP Discovery Port 1718
UDP Registration and Status Port 1719
TCP Call Signaling Port 1720
UDP 53
UDP 69
UDP 161, 162
H.245 negotiations usually take place on a separate channel from the one used for H.225 exchanges, but newer applications support tunneling of H.245 PDUs within the H.225 signaling channel. There is no well-known port for H.245. The H.245 transport address always is passed in the call-signaling message. In other words, port information is passed within the payload of the preceding H.225/Q.931 signaling packets. The media channels (those used to transport voice and video) are similarly dynamically allocated. Figure 1 is an example of H.245 call control.

Figure 1: H.245 Call Control
The called party opens the TCP port for establishing the control channel after extracting the port information from the H.225/Q.931 signaling packet. During this exchange, terminal capabilities such as codec choice and master/slave determination are negotiated. Media channel negotiations begin with the OpenLogicalChannel Request packet. When the called party is ready to talk, it responds with an OpenLogicalChannel Ack, which contains the dynamic port information in the payload. As an aside, this use of dynamic ports makes it difficult to implement security policy on firewalls, NAT, and traffic shaping. In some cases, a special H.323-aware firewall or firewall component called an Application Layer Gateway (ALG) is required to reliably pass H.323 signaling and associated media. Once both RTP/RTCP channels are opened, communications proceeds (see Figure2).

Figure 2: RTP/RTCP Media Streams


H.225/Q.931 Call Signaling

Assuming a slow start connection procedure, the H.225 protocol defines the two important stages of call setup: Call signaling and RAS. Call signaling describes standards for call setup, maintenance and control, and teardown. A subset of Q.931 call signaling messages are used to initiate connections between H.323 endpoints, over which real-time data can be transported. The signaling channel is opened between an endpoint-gateway, a gateway-gateway, or gateway-gatekeeper prior to the establishment of any other channels. If no gateway or gatekeeper is present, H.225 messages are exchanged directly between the endpoints.

H.225 messages are encoded in binary ASN.1 PER (Packed Encoding Rules) format. Although the H.225.0 signaling channel may be implemented on top of UDP, all entities must support signaling over TCP port 1720.
Signaling traffic is binary encoded using ASN.1 (Abstract Syntax Notation One) syntax and per encoding rules. ASN.1 is not a programming language. It is a flexible notation that allows one to define a variety of data types. ASN.1 theoretically allows two or more dissimilar systems to communicate in an unambiguous manner. Frankly, this aim is more difficult than it might seem at first.
ASN.1 encoding rules are sets of rules used to transform data specified in the ASN.1 language into a standard format that can be decoded on any system that has a decoder based on the same set of rules. The H.323 family of protocols is compiled into a wire-line protocol using PER. PER (Packed Encoding Rules), a subset of BER, is a compact binary encoding that is used on limited-bandwidth networks. PER is designed to optimize the use of bandwidth, but the tradeoff is complexity—decoding PER PDUs has led to problems due to a number of factors including issues with octet alignment (PER encoding can be aligned or unaligned), integer precision (at times, a PER value may not contain a length field), and unconstrained character strings.
The H.225 protocol also defines messages used for endpoint-gatekeeper and gatekeeper-gatekeeper communication—this part of H.225 is known as RAS (Registration, Admission, Status), and unlike call signaling, runs over UDP. RAS is used to perform registration, admission control, bandwidth status changes, and teardown procedures between endpoints and gatekeepers. A RAS channel, separate from the call setup signaling channel, is used to exchange RAS messages. This second signaling channel is opened between an endpoint and a gatekeeper prior to the establishment of additional channels.
Establishing a call between two endpoints requires a different connection schedule depending upon what entities are involved in the session. For direct connections between endpoints, two TCP channels are set up between the endpoints: one for call setup (Q.931/H.225 messages) and one for capabilities exchange and call control (H.245 messages). First, an endpoint initiates an H.225/Q931 exchange on a TCP well-known port (TCP 1720) with another endpoint. Several H.225/Q.931 messages are exchanged, during which time the called phone rings. Successful completion of the call results in an end-to-end reliable channel that supports the first of a number of H.245 messages. At the end of this exchange the called party picks up the receiver.
Note that the first of these signaling messages, the H.225.Q.931 Call Setup message (see Figure 1), has been the focus of extensive security vulnerability studies by the Oulu Secure Programming Group.

Figure 1: H.225/Q.931 Signaling

If a gatekeeper is present between the endpoints (a more common scenario), then H.225 RAS signaling precedes the Q.931 signaling and abides by the sequence diagram shown in Figure 2.

Figure 2: H.225/Q.931 RAS

These messages are used to register with a gatekeeper and to request permission to initiate the call:
  • Gatekeeper Request (GRQ) The GRQ packet is unicast in order to discover whether any gatekeepers exist. This requires that the gatekeepers IP address is configured on the endpoint. If this is not configured, the endpoint can fall back to multicast discovery of the gatekeeper.
  • Gatekeeper Confirm or Reject (GCF/GRJ) Reply from the gatekeeper to endpoint that rejects the endpoint’s registration request. Often due to configuration problems.
  • Registration Request (RRQ) Request from a terminal or gateway to register with a gatekeeper.
  • Registration Confirm or Reject (RCF/RRJ) Gatekeeper either confirms or rejects.
  • Admission Request (ARQ) Request for access to packet network from terminal to gatekeeper.
  • Admission Confirm or Reject (ACF/ARJ) Gatekeeper either confirms or rejects. If confirmed, the transport address and port to use for call signaling are included in the reply.
There are supplementary messages defined in the H.225/RAS specification that are used to request changes in bandwidth allocation, to reset timers, and for informational purposes. After the gatekeeper confirms the admission request, call signaling can begin. Signaling proceeds in the same manner as in Figure 2.
We have found privately that flooding multiple, malformed GRQ (Gatekeeper Request) packets to the gatekeeper results in the disconnection of a number of vendor’s IP phones.
Related Posts with Thumbnails

Link Exchange