NAT as a Topology Shield

NAT provides a security function by segregating private hosts from the publicly routed Internet. Depending upon your addressing requirements, NAT can isolate, to some extent, your VoIP network IP space from the balance of your internal network IP space. The large number of private RFC1918 IP addresses allows system architects to intelligently address hosts and other network elements based upon location, function, or other criteria during the design phase of the VoIP network.
External hosts cannot directly access a particular internal host if a NAT intervenes since the external host has no way of targeting its payload to a chosen IP address. Of course, when addresses are assigned dynamically, it becomes even more problematic for an attacker to point to a specific host within the NAT domain. This may help protect internal hosts from external malicious content. At worst, NAT is an additional layer of security controls that you implement as part of your overall security architecture.
The IPsec model is instructive in that it illustrates a complex interaction between encryption and NAT. However, IPsec is not the only functional or proposed security mechanism for VoIP environments. SSL/TLS, S/MIME, HTTP 1.1 digest, and ZRTP have also been proposed as security instruments. Nor are all environments as simple as the symmetric examples we have seen where one or more devices reside on opposite sides of a NAT device. Asymmetric or hairpin call routing (a call from one phone behind a NAT to another phone behind the same NAT), in an environment where basic NAT and encryption issues have been resolved, can cause communications to fail. The point here is to introduce some of the concepts that you will come across as you design and troubleshoot in this area. We’ll see in the next section how encryption, NAT, and VoIP protocols work (or don’t work) together.


NAT and Encryption

As IPsec VPNs became popular, NAT became an impediment to their initial widespread implementation. I’ll use the IPsec model to develop a description of the interactions between NAT and encryption since it is one of the more popular Internet encryption systems and has potential value in VoIP networks. The IP security (IPsec) protocol was defined by the Internet Engineering Task Force (IETF) to provide security for IP networks. IPsec is a large protocol suite designed to provide the following security services for IP networks: Data Integrity, Authentication, Confidentiality, and Application-transparent Security. IPSec secures packet flows and key transmission. Since we are interested in NAT and encryption, we’ll ignore most of the protocol suite including key exchange (IKE), and the various hash and encryption algorithms, and focus instead on the protocols that are used to secure packet flows.
The AH and ESP protocols can operate in two modes: Transport Mode can be visualized simply as a secure connection between two concurring hosts. In Tunnel Mode—more of a “VPN-like” mode—IPsec completely encapsulates the original IP datagram, including the original IP header, within a second IP datagram. ESP and AH normally are implemented independently, though it’s possible (but uncommon) to use them both together.
The Authentication Header (AH) and the Encapsulating Security Payload (ESP) are the two main network protocols used by IPsec. The AH provides data origin authentication, message integrity, and protection against replay attacks, but has no provision for privacy—data is not encrypted. The key to the AH authentication process is the inclusion in the AH header of an Integrity Check Value (ICV) —a hash based upon a secret key that is calculated over a subset of the original IP header fields, including the source and destination IP addresses. AH guarantees (if implemented correctly) that the data received is identical to the data sent, and asserts the identity of the true sender. AH provides authentication for as much of the IP header as possible, as well as for upper level protocol data. However, some IP header fields (SIP, DIP, TTL, CHKSUM, and optionally, TOS, FLAGS, and OPTIONS) change in transit. The values of such fields usually are not protected by AH. In transport mode, AH is inserted after the IP header and before the upper layer protocol (TCP, UDP, ICMP, etc.) header. In tunnel mode, the AH header precedes the encapsulated IP header. Figure 1 shows the AH transport and tunnel modes.

Figure 1: Authentication Header: Transport and Tunnel Modes
In Figure 1, sections A and B show the location of the AH header in transport mode. Sections C) and D show the location of the AH header in tunnel mode. The data field in all packets is not to scale (indicated by the double slanted lines). You can see from this figure that tunnel mode AH adds an additional 20 bytes to the length of each packet. None of the fields in this figure are encrypted.
The key to the incompatibility of NAT and IPsec AH is the presence of the ICV, whose value depends partially on the values of the source and destination IP addresses, the IP header checksum, and either the TCP or UDP header checksum. The AH ICV calculation takes into account the mutable and predictable header fields that change as the packet moves from hop to hop through the network, but because intermediate devices do not share the secret key, they cannot recalculate the correct ICV after NAT has altered the aforementioned original header fields.
ESP, on the other hand, was used initially only for encryption; authentication functionality was subsequently added. The ESP header is inserted after the IP header and before the upper layer protocol header (transport mode) or before an encapsulated IP header (tunnel mode).
Figure 2 shows the location of the ESP header in both transport mode (sections A and B) and tunnel mode (sections C) and D) for TCP (sections A and C) and UDP (sections B and D). In transport mode, the original IP header is followed by the ESP header. The rightmost field contains the ESP trailer and optionally, the ESP authorization field. Only the upper-layer protocol header, data, and the ESP trailer (also, optionally, the ESP authorization field) is encrypted. The IP header is not encrypted.

Figure 2: ESP Header: Transport Mode and Tunnel Mode
In transport mode, ESP encrypts the entire packet. This means that the entire original IP datagram, including the original IP and protocol header, is encrypted. In this mode, when IP traffic moves between gateways, the outer, unencrypted IP header contains the IP addresses of the penultimate source and destination gateways, and the inner, encrypted IP header contains the IP source and destination addresses of the true endpoints. However, even though ESP encrypts most of the IP datagram in either transport or tunnel mode, ESP is relatively compatible with NAT, since ESP does not incorporate the IP source and destination addresses in its keyed message integrity check. Still, ESP has a dependency on TCP and UDP checksum integrity through inclusion of the pseudo-header in the calculation. As a result, when checksums are calculated, they will be invalidated by passage through a NAT device (except in some cases where the UDP checksum is set to zero).
NAT traversal using ESP leads to a catch-22. NAT must recalculate the TCP header checksums used to verify packet integrity, because as was showed earlier, NAT modifies those headers. If NAT updates the header checksum, ESP authentication will fail. If NAT does not update the checksum, TCP verification will fail. One way around this, if the transport endpoint is under your control, is to turn off checksum verification, but I’m not aware of anyone who has done this in production environments. A second, more common means to do this is to NAT before IPSec; don’t perform IPSec before NAT. This can be accomplished by locating the NAT device logically behind the IPsec device. The most common form of NAT traversal used today relies on encapsulating IPsec packets in UDP in order to bypass NAT devices. The IPsec packet is encapsulated in a meta-UDP packet and the metaUDP packet is stripped off after it passes through the NAT device. This enables NAT and IPsec to function together but none of these are hardly elegant solutions.


NAT Has Three Common Modes of Operation

Depending upon networking requirement and topology requirements, NAT is manifested in one of three related modes. Static NAT refers to a one-to-one mapping or correspondence between internal and external IP addresses. In this case, the number of internal IP addresses equals the number of external addresses (see Figure 1).
Figure 1: Static NAT
The NAT device maintains a lookup table of internal and external addresses in order to manage translations in a stateless manner. Static NAT has utility in mapping the private internal IP addresses of critical infrastructure servers and network appliances to a unique globally available IP address.
Dynamic NAT in its original form consisted of an outside pool or collection of public IP addresses that were used on a first-come, first-served strategy (see Figure 2). Each unique single internal address could be used by any member of the outside pool to communicate with external Internet hosts. Consequently, the size of the outside pool member set limited the number of inside users that could connect externally. A built-in timeout mechanism allowed external pool members to be reused.

Figure 2: Dynamic NAT
The third and probably most common style of NAT is derived functionally from Dynamic NAT since it reuses a smaller pool or a single external IP address to proxy for all the internal IP addresses. This NAT is known by a number of names, including Network Address Port Translation (NAPT), Port Address Translation (PAT), Full Cone NAT (From the STUN RFC3489), hiding NAT, and masquerading NAT. This type of NAT (we’ll call it NAPT to keep things organized) works to preserve state by maintaining a lookup table of source IP, destination IP, source port, and destination port. This 4-tuple is almost always guaranteed to be unique within a given conversation stream. You’ll find NAPT operating in almost all home broadband and in most large enterprise networking scenarios. Figure 3 shows an example of NAPT.

Figure 3: Network Address Port Translation
So a normal scenario that occurs when moving TCP traffic between two domains running NAT at each edge is shown in Figure 4.
Figure 4: Normal NAT Process with TCP
In addition to these three NAT modes, STUN (we’ll see this later) has defined a three types of NAT that map more or less to these three modes. These are cone NAT, restricted NAT, and symmetric NAT. We’ll talk more about these in the section on STUN and TURN.
Section A of Figure 4 shows the TCP/IP packet header prior to NAT. After passing through the first NAT edge device (section B), the four header fields are modified: the three IP header fields—source address, destination address, and checksum—and the TCP header checksum. After passing through the second NAT edge device, the original header fields are regenerated (section C). The same is true for UDP in this situation, except that if the UDP checksum is zero, it will not be altered.
You may naturally ask by now, why is NAT such an issue for VoIP? Well, when we begin to combine NAT and protocols such as H.323 and SIP that partition the signaling and media channels; and, to make things even more interesting, embed IP addresses in the signaling channel, it will be important to understand how, when, and where NAT manipulates these fields. When we add encryption into the mix, NAT adds further complexity to these systems. Additionally, note that NAT stores its address mapping information in binding tables, and that these bindings are only initiated by outbound traffic. NAT breaks the choreography of SIP session flow. Encryption adds further complexity to these systems.


How Does NAT Work?

To a system on the Internet, a NAT device appears to be the source/destination for all traffic originating from behind the NAT device. Hosts behind a NAT device do not have true end-to-end Internet connectivity and cannot directly participate in Internet protocols that require initiation of TCP connections from outside the NAT device, or protocols that split signaling and media into separate channels.
A NAT device examines and records certain IP header information from each packet within an active IP connection. It uses these connection data to multiplex or demultiplex traffic depending upon the direction of the traffic flows. Multiplexing, in this case, means that two or more traffic streams are combined into a single outbound channel; demultiplexing refers to the process of separating a complex inbound traffic stream into single traffic streams (see Figure 1).

Figure 1: Multiplexing and Demultiplexing
NAT devices manipulate a subset of the IP header information. In order to comprehend the sometimes complex interaction of NAT, encryption, and VoIP protocols, you will have to understand the IP header fields and how they are altered during the NAT and encryption processes. It is not necessary for you to understand these concepts if you are concerned only with a NAT device’s ability to hide internal network topology from the Internet, but as part of the process of securing VoIP communications, this information is critical. Get to know the header diagrams shown in Figure 2. You’ll be seeing them frequently.

Figure 2: IP, TCP, and UDP Headers
Note that the rest of this section applies only to IPv4 packets. IPv6 resolves most of the following issues, but it just hasn’t caught on yet. The IP header normally consists of 20 bytes of data. The TCP header also normally consists of 20 bytes of data. An options field exists within each header that allows further bytes to be added, but normally this is not used. The UDP header is 8 bytes in length. Both the TCP and UDP headers reside in the data field of an IP packet. In Figure 2, the data field is to the right of the options field for IP and TCP headers and to the right of the CHKSUM field in the case of the UDP header.
NAT devices monitor, record, and alter the source IP address (SIP), destination IP address (DIP), and checksum (CHKSUM) fields within IP headers. NAT also modifies the checksum fields of bothTCP and UDP packets since these checksums are computed over a pseudo-header that conceptually consists of the source and destination IP addresses, and the protocol and length fields for TCP. The UDP checksum is calculated over a pseudo-header that consists of the source and destination IP addresses, the UDP header and data. As for ICMP Query packets, no further changes in the ICMP header are required as the checksum in the ICMP header does not include the IP addresses. These checksum fields will prove particularly troubling as we modify VoIP packets by encryption over NAT.
In response to the pseudo-header complexities, RFC1631 suggests that:
NAT must also look out for ICMP and FTP and modify the places where the IP address appears. There are undoubtedly other places, where modifications must be done. Hopefully, most such applications will be discovered during experimentation with NAT.
Though these were bright individuals it seems to me unlikely that they would have imagined that their complex solution would prove to be a major complication to end-to-end application availability on today’s contemporary internetworks. Figure 3 shows how NAT alters four header fields.

Figure 3: NAT Alters Four Header Fields


NAT and IP Addressing

Network Address Translation (NAT) is a method for rewriting the source and/or destination addresses of IP packets as they pass through a NAT device, which is often a router or firewall that separates two realms or domains on the Internet. NAT was first officially proposed (RFC1631) in 1994 as a temporary solution to the problems of IP address space depletion and the rapidly increasing size of route tables. Addresses, at that time, were divided into two classes: local and global addresses. Today we normally refer to these addresses as either private or public, and the private IP space often is referred to as RFC1918 addresses. Per RFC1918, the Internet Assigned Numbers Authority (IANA) reserved three blocks of the IP address space for private internets:
  •– (10/8 prefix)
  •– (172.16/12 prefix)
  •– (192.168/16 prefix)
NAT commonly is used to enable multiple hosts on private networks to access the Internet using a single public (Internet routable) IP address. Note that although NAT most commonly is used to map IP addresses from internal private IP space to the public IP space, NAT can be used to map between any two IP address domains. Additionally, NAT provides a security function by segregating (hiding) private hosts from the publicly routed Internet. This short-term kludge has had an enormous impact on the day-to-day functioning of the Internet, and has special relevance to system administrators who are charged with securely transporting VoIP packet data across network boundaries.


QoS and Traffic Shaping

VoIP has strict performance requirements. The factors that affect the quality of data transmission are different from those affecting the quality of voice transmission. For example, data generally is not affected by small delays. The quality of voice transmissions, on the other hand, is lowered by relatively small amounts of delay. VoIP call quality depends on three network factors, as mentioned earlier:
  • Latency The time it takes for a voice transmission (or any transmission) to travel from source to destination is increased as packets traverse each security node. Primary latency-producing processes are firewall/NAT traversal, negotiation of long ACLs, and traffic encryption/decryption.
  • Jitter (erratic packet delays) Jitter may be increased, because in many circumstances, jitter is a function of hop count.
  • Packet loss The number of non-QoS-aware routers and firewalls that ignore or fail to properly process Type of Service (ToS) fields in the IP header can influence packet loss.
In the absence of QoS or Traffic shaping, data networks operate on a best-effort delivery basis, which means that all data traffic has equal priority and an equal chance of being delivered in a prompt manner. However, when network congestion occurs, all data traffic has an equal chance of being dropped and/or delayed. When voice data is introduced into a network, it becomes critical that priority is given to the voice packets to insure the expected quality of voice calls. The mechanisms used to accomplish this are generically referred to as traffic shaping.
Traffic shaping is an attempt to organize network traffic in order to optimize or guarantee performance and/or bandwidth. Traffic shaping relies upon concepts such as classification, queue disciplines, scheduling, congestion management, quality of service (QoS), class of service (CoS), and fairness.
Common CoS models include the Differentiated Services Code Point (DiffServ or DSCP, defined in RFC 2474 and others) and IEEE 802.1Q/p. DSCP specifies that each packet is classified upon entry into the network. The classification is carried in the IP packet header, using 6 bits from the deprecated IP type-of-service (ToS) field to carry the classification (code point) information, which ranges from 0 through 63. Generally, the higher number equates to higher priority.
802.1Q defines the open standards for VLAN tagging. Twelve of the 16 bits within the two Tag Control Information bytes are used to tag each frame with a VLAN identification number. 802.1p uses three of the remaining bits (the User Priority bits) in the 802.1Q header to assign one of eight different classes of service (0 = low priority; 8 = high priority).
Quality of Service involves giving preferential treatment of particular classes or flows of traffic primarily by manipulating queues and scheduling. A service quality is then negotiated.
Examples of QoS are CBWFQ (Class Based Weighted Fair Queuing), RSVP (RESERVATION Protocol-RFC 2205), MPLS, (Multi Protocol Label Switching-RFC 1117 and others). CoS, or tagging, is ineffective in the absence of QoS because it can only mark data. QoS relies on those tags or filters to give priority to data streams.
Networks with periods of congestion can still provide excellent voice quality when using an appropriate QoS/CoS policy. The recommendation for switched networks is to use IEEE 802.1p/Q. The recommendation for routed networks is to use DiffServ Code Points (DSCP). The recommendation for mixed networks is to use both.
The main purpose of these technologies is to ensure that application performance remains satisfactory regardless of network conditions. In general, they all work by categorizing traffic into discrete subsets that are processed with different priorities. For this reason, QoS techniques may be useful in protecting VoIP networks from a significant security threat—Denial of Service. A number of authors have shown that some VoIP architecture components including IP telephones, SIP proxies, and H.323 gateways may freeze and crash when attempting to process a high rate of packet traffic. QoS can provide some security for these devices during DoS attack either by prioritizing unauthorized data low and/or by prioritizing VoIP high. This measure (security layer) will mitigate the consequences of a DoS attack on applications that share the same physical bandwidth.
The downside of all this is that traffic shaping is, at times, a stew of poorly interoperable technologies and techniques. This ad hoc nature makes a true end-to-end QoS strategy sometimes difficult to implement. If possible, provide enough bandwidth resources to meet the expected peak demands with a substantial safety margin. Note also that the implementation of some security measures can degrade quality of service.
These security-related complications are bulleted at the beginning of this section, and range from interruption or prevention of call setup by misconfigured firewall rules to encryp-tion-produced latency and delay variation (jitter). There is no single best method at present to optimize traffic shaping on VoIP networks without taking into account the relationship of these technologies with the security measures implemented within your environment.


VLANs and Softphones

Softphones present a security challenge in a VoIP environment, particularly if VLANs are employed as a major security control. Several popular softphones (such as X-Lite) store credentials unencrypted in the Window’s registry even after uninstallation of the program. Many softphones contain advertising software that attempts to “phone home” with private user information. Host-based IDS or firewall applications have limited use in this situation because softphones require that PC-based firewalls open a number of high UDP ports as part of the media stream transaction. Additionally, any special permissions that the VoIP application has within the host-based firewall rule set will apply to all applications on that desktop (e.g., peer-to-peer software may use SIP for bypassing security policy prohibitions).
The most important rule for securing softphones is to harden the underlying operating system. Malware that affects any other application software on the PC can also interfere with voice communications. The flip-side is also true—malware that affects the VoIP software will affect all other applications on the PC and the data services available to that PC (a separate VoIP phone would not require access to file services, databases, etc.). Softphones that contain any type of advertising software must be banned in a secure environment Softphone installation targets should be tested before deployment and those that do not encrypt user credentials should be prohibited.
Because PC workstations are necessarily on the data network, using a softphone system conflicts with the requirement to separate voice and data networks since the principle of logically separating voice and data networks is defeated because the PC must reside in both domains. One solution to this is dual home workstations—dedicate one NIC to the data domain and one NIC to the voice domain. This arrangement still allows for possible routing of information between domains via a workstation. Cisco recently has introduced a Certificate Trust List (CTL) that contains among other information, the IP addresses of trusted VoIP peers. However, this feature is available only in selected IP phones and requires, for the most part, setup and maintenance of a complex certificate infrastructure. Additionally, unless complex host firewall rules are implemented, non-VoIP related data can enter the voice domain from workstations. Frankly, there is no single good security solution to the issue of softphones on workstations in split voice/data environments. In a highly secure environment, your best choice is to ban them via policy and monitor for illicit usage via IDS or IPS.


VLAN Security

VLAN and layer 2 security is a complex topic, partially because of the uneven support by switch vendors for appropriate datalink safeguards and because many of the exploitable vulnerabilities arise due to misconfiguration of available safeguards. The single most important rule with regard to this topic is to absolutely ensure that unauthorized individuals do not have access to the switch console. Additionally, terminal access to the console should either require strong authentication (RADIUS or AAA) and be restricted to a small set of management PCs, or should be eliminated altogether.
VLAN function depends upon the presence or absence of tag information. If the integrity of the tag information is assured, then the logical security afforded by VLANs is as legitimate as physical security. The key is to certify that tag information originates from the appropriate hosts and is unchanged in transit. A number of controls exist to verify this information such as ARP inspection, DHCP spoofing, VACLs (VLAN ACLs), private and dynamic VLANs, port security, and 802. 1X admission controls, but implementation of these is vendor specific and beyond the scope of this section. Additionally, the IEEE 802.1 Working Group has established drafts, particularly, 802.1aj, that decompose security when two related MACs are in a relay configuration.


VLANs | Logically Segregate Network Traffic

Logical separation of voice and data traffic via VLANs is recommended in order to prevent data network problems from affecting voice traffic and vice versa. In a switched network environment, VLANs create a logical segmentation of broadcast or collision domains that can span multiple physical network segments. VLANs remove the need to organize and manage PCs or softphones based upon physical location, and can be used to arrange endpoints based upon function, class of service, class of user, connection speed, or other criteria. The separation of broadcast domains reduces traffic to the balance of the network. Effective bandwidth is increased due to the elimination of latency from router links. Additional security is realized if access to VLAN hosts is limited to only hosts on specific VLANs and not those that originate from other subnets beyond the router.
VLANs, or virtual LANs, can be thought of as logically segmented networks mapped onto physical hardware. One or more VLANs can coexist on a single physical switch. The predominant VLAN flavor is IEEE 802.1Q, as defined by the IEEE. Prior to the introduction of 802.1q, Cisco’s ISL (Inter-Switch Link) was one of several proprietary VLAN protocols. ISL is now deprecated in favor of 802.1 q. VLANs operate at layer 2 of the OSI model. However, a VLAN often is configured to map directly to an IP network or subnet, which gives the appearance that it is involved at layer 3.
VLANs can be configured in various ways—by protocol (IP or IPX, for example) or based on MAC address, subnet, or physical port. They can be static, dynamic, or port-centric. Mechanistically, VLANs are formed by either frame-tagging or frame-filtering. Frame-tag-ging, the more common mechanism, requires adding and removing a unique, 2-byte L2 frame identifier so that switches may appropriately send and receive their cognate VLAN traffic. Frame-filtering relies upon the participating switches building and communicating a filtering database in order to forward traffic to its correct VLAN.
In Figure 1, dotted lines represent VLAN 2 and solid lines represent VLAN 10. The presence of the two lines that form a trunk between the top level switches should not be taken to indicate that there are two physical connections. Servers and workstations are logically isolated based upon their physical location. If a New York workstation requires the services of a Los Angeles server, then those data are routed between the top level switches.

Figure 1: Location-Based VLANs
In Figure 2, dotted lines represent VLAN 2, solid lines represent VLAN 10, and dashdot lines represent VLAN 100. The presence of the three lines that form a trunk between the top level switches should not be taken to indicate that there are three physical connections. In the network shown in Figure 2, broadcast traffic in the telephone subnet will not be seen by hosts in the workstation subnet.

Figure 2: Function-Based VLANs
VLANs provide some security and create smaller broadcast domains by creating logically separated subnets. Broadcasts are a common, sometimes noisy phenomenon in data networks. Creating a separate VLAN for voice reduces the amount of broadcast traffic (and unicast traffic on a shared LAN) the telephone will receive. Separate VLANs can result in more effective bandwidth utilization, and reduce the processor burden on IP telephones and PCs by freeing them from having to analyze irrelevant broadcast packets. Management traffic can be segregated on a management VLAN so that SNMP and syslog traffic do not interfere with data traffic. This also has the benefit of adding a layer of security to the management network. Additionally, VLANs can be used in conjunction with various quality of service mechanisms (see next section) to further isolate and prioritize voice traffic.
The consequences of DoS attacks can be mitigated by logically separating voice and data segments into discrete VLANs. Segregation of network traffic requires that IP traffic pass through a Layer 3 device, thereby enabling the traffic to be inspected at the ACL level. VLAN segregation forces any DoS packets through the ACLs on the layer 3 device. The use of packet filtering or stateful firewall inspection at these junctions also is recommended. As a side note, user authentication prior to the user’s accessing the telephony device also will reduce the possibility of internal DoS attacks.


Logically Segregate Network Traffic

One of the principal advantages of converging voice and data is to save money and to simplify administration and management by running both types of traffic over the same physical infrastructure. With this in mind, it is ironic that most of the engineering effort expended during the VoIP architecture design phase focuses on logically separating this same voice and data traffic.
Packetized voice is indistinguishable from any other packet data at Layers 2 and 3, and thus is subject to the same networking and security risks that plague data-only networks. The gen-eral idea that motivates the logical separation of data from voice is the expectation that network events such as broadcast storms and congestion, and security-related phenomena such as worms and DoS attacks, that affect one network will not impact the other. This is the principal consequence of compartmentalization.
In practice, system and security administrators have a number of options to realize this logical division. Packet headers can be manipulated in order to separate datagrams and datastreams at Layer 2, to provide certain classes of packets with preferential treatment or more bandwidth; and to alter source and destination IP addresses. Firewalls (particularly VoIP-aware firewalls), application layer gateways (ALGs), routers, and switches are inserted in the datapath to monitor and control traffic streams. Many devices now support robust access control lists (ACLs) that are used to fine-tune network and application access. Encryption is used often to ensure data and signal channel authentication, integrity, and privacy, but the encryption process results in subtle and not-so-subtle interactions with the methods that manipulate packet headers.
Maintaining and securing contemporary data and voice networks is complex stuff—something not recommended for naïve system administrators. Gone are the days when networks could be pieced together in an ad hoc fashion in order to support gopher, e-mail, and ftp. Modern VoIP/data networks must be designed to support a sometimes bewildering array of applications—all with their own unique service requirements and SLAs—in an open, yet secure environment.
To this end, in this chapter we look at the methods used to segregate voice and data into logically isolated networks that run over the same physical infrastructure. Figure 1 shows the components of this architecture. The primary elements of the security architecture are VLANs, QoS scheduling, firewalls, NAT and intelligent IP address space management, and ACLs. Encryption also plays a role in this. We will look at each of these technologies in more detail in the following sections.
Figure 1: Converged Reference Network
Figure 1 is a diagram of a VoIP/data reference network that illustrates the major security components involved in logical segregation of network traffic types. At the border between the Internet and the internal network, firewalls, ALGs, and router-based ACLs provide the first line of defense or security layer against illicit traffic and attackers. Within the internal domains, VLANs, QoS, private IP addresses, and NAT segregate VoIP traffic from other data network traffic, and VoIP-aware firewalls and router-based ACLs manage traffic between the two domains. Softphones may or may not span both domains depending upon an organization’s sensitivity to risk.


Active Security Monitoring

An appropriate firewall policy can minimize the exposure of your internal networks. However, attackers are evolving their attacks and network subversion methods. These techniques include e-mail-based Trojan horses, stealth scanning techniques, and attacks which bypass firewall policies by tunneling access over allowed protocols such as ICMP, HTTP, or DNS. Attackers are also getting better at using the ever-growing list of application vulnerabilities to compromise the few services that are allowed through a firewall.
Firewalls and Access Control Lists are requisite security controls in any enterprise, but they are not sufficient in contemporary networks. Active monitoring of the network and attached devices provides not only one or more additional layers of defense, but also supplies data that may have a forensic utility. Active monitoring consists of the following types of activities: network monitoring, network intrusion detection, host-based intrusion detection, syslog, and SNMP logging. Penetration and vulnerability testing monitors and validates existing security controls.
On enterprise networks, network monitoring is typically managed by a comprehensive tool suite such as OpenView. Traffic patterns and quantities, and device state are common mea-surements. These tools supply data that can be useful to security administrators, particularly when combined with the results of recent penetration/vulnerability tests or with NIDS/HIDS data. Unfortunately, the correlation of these data is difficult even when using tools such as SMARTS (a root-cause correlation engine), because of the overwhelming amount of data that must be organized.
NIDS and HIDS are complementary intrusion detection technologies. NIDS monitors the network for malicious or unauthorized traffic and HIDS monitors critical servers for changes to significant files and directories. Both relay event data to a central management console for logging and visualization. Most current NIDSs use a combination of signature (pattern or regex) and anomaly-based detection. Both of these methods have benefits and drawbacks. Signature-based detection is quick, effective, and popular, but it won’t catch attacks that don’t have signatures. Anomaly detection is theoretically a better method for detecting attacks, but suffers from the basic problem that it is difficult to define “normal” traffic on a network.
Although functionally dissimilar, SNMP and syslog both provide transport for event messages over the network from agents or endpoints to a centralized information repository. SNMP is a highly structured, binary-formatted message type, while syslog messages are ASCII-based and relatively arbitrary within the confines of three defined fields. Neither protocol is encrypted. Thus, SNMP and syslog messages should always be limited to a constrained management network.
Penetration and vulnerability testing is both art and science. These assessments are only as good as the people and tools used to perform them. In today’s environment most types of penetration/vulnerability assessment have been commoditized due to the ready availability of scanning and vulnerability assessment tools.
Some tools, such as Nessus (which until recently was open source), make it possible for naïve administrators to perform at least baseline vulnerability scans on their networks. In this case, we recommend that an experienced security analyst be brought in to analyze the data since all of the vulnerability scanners report various false alarms. One important note is that the results of a test only reflect the security status during the testing period. Even minor administrative and architectural changes to the environment performed only moments after a penetration test can alter the system’s security profile.


Methodology | Penetration/Vulnerability Test

The team should thoroughly investigate target systems and networks in a structured manner, documenting their findings as they proceed. The goal is to attempt to identify all thesignificant vulnerabilities on the network—including their location and implications—and provide recommendations for securing the affected systems. Testing results in a comprehensive, operational review or “snapshot” of the state of the network. Testing should include an analysis of the external network from the perspective of an outside hacker, and/or a review of the internal network from the perspective of a disgruntled employee or contractor.


The discovery process takes advantage of publicly available information that relates to your organization. Internet search engines, Whois databases, network registrars, DNS servers, and company Web sites are all sources of information. This phase can yield data that your organization might wish to protect. Table 1 lists a number of recommended tools used during the discovery phase. All of these are either native UNIX tools or are freeware, with the exception of WSPingPro.
Table 1: Common Security Testing Tools 
Vulnerability Assessment
SQLPing 2
ISS Internet Scanner
@stake Proxy
John the Ripper


Scanning or fingerprinting utilizes a variety of automated, non-intrusive scans. Nmap is a recommended tool for this step. Foundstone’s SuperScan is another useful tool at this stage. Results of these scans should be constantly monitored in order to minimize bandwidth issues and to ensure that the scanning process does not result in loss of network connectivity for any networked devices. If any device fails under this type of scanning, that is a finding in itself.
It may be useful to emulate specific IP phones when testing VoIP gateways. For testing H.323 gateways or gatekeepers, the OpenH323 project offers OpenPhone, which has a GUI for Windows clients and command-line options for Linux distributions.
For testing SIP proxies, registrars, and gateways, many sites (such as sipXphone and YATE) have open-source SIP clients that are quite configurable. SJ Labs’ SJphone softphone ( is also useful for testing in a VoIP environment, and is free for 30 days. SIPsak and SIPbomber are also useful SIP proxy testing tools. Callflow ( can be very useful for examining and understanding the alterations in calling message sequences that can result when performing SIP testing.
As an indication of the maturity of this field, SiVuS ( has been released. SiVuS is the first publicly available vulnerability scanner for VoIP networks that use the SIP protocol.

Vulnerability Assessment

Vulnerability assessment, one of the most important phases of penetration testing, occurs when your team maps the profile of the environment to publicly known or, in some cases, unknown vulnerabilities. Tools such as Nessus, Retina, and ISS Internet Scanner are all good choices at this stage. An excellent listing of the top 75 security tools can be found at
When you are vulnerability testing VoIP networks, it is not necessary to test every IP phone. Because of the oftentimes, sheer number of IP phones, vulnerability testing has the potential to generate enough network traffic that voice quality is negatively affected. Testing one particular IP phone per vendor is often adequate since configurations should be functionally identical.
In most VoIP environments, it is possible to identify IP phones by their SNMP signature. Calling the IP phone directly—thus, bypassing any gateways or gatekeepers—can sometimes yield interesting information.


The exploitation phase begins once the target system’s vulnerabilities are mapped. The testers will attempt to gain privileged access to a target system by exploiting the identified vulnerabilities. This may take the form of running an exploit tool such as scalp.c or iis5hack.c, or launching a password guessing attack using THC-Hydra, a network authentication cracker. (An excellent resource of known/default accounts and associated passwords is located at


Throughout the testing, the team should maintain a detailed journal of activities to account for effects and results of the testing procedures. This record will serve to distinguish the test team’s activities from any other anomalies that occur during the course of the penetration test. Some techniques for capturing these data include the use of echo and logging. When appropriate, the use of screen captures may be an option.
  • Detailed results of the testing performed
  • What the results indicate
  • Recommendations on types of corrective actions
One internal measure that can be used to quantify a particular vulnerability is a “Threat Index.” This index is based upon two independent metrics: perceived risk (Table 2) and an estimated frequency (Table 3). The subsequent two-part identifier is formed by combining these two results, and is placed in the 3X3 matrix. The Threat Index (TI) has several purposes: First, it is used to rapidly prioritize a discovered vulnerability. Severe or high TIs (see Table 4) require immediate attention, and may also require more in-depth analysis by testers. Second, the TI can be used to rapidly code particular vulnerabilities. For example, if a newly discovered vulnerability is ranked with a TI of H1, all members of the team immediately understand that this is a severe problem that requires immediate action, while a TI of L3 indicates an insignificant issue.
Table 2: Risk Categories 
High Risk (H)
Loss of critical proprietary information, system disruption, or severe environmental damage
Medium Risk (M)
Loss of proprietary information, severe occupational illness, or major system or environmental damage
Low Risk (L)
Minor system or environmental damage
Table 3: Modified Department of Defense Frequency Categories 
Frequent (1)
Likely repeated occurrences
Occasional (2)
Possibility of repeated occurrences
Improbable (3)
Practically impossible
Table 4: Threat Index 
High Risk (H) Med
ium Ri sk (M) Low
Ris k (L)
Frequent (1)
Occasional (2)
Improbable (3)
Your organization can apply these criteria in any way you see fit. The point is to determine as objectively as possible a method to prioritize threats against your infrastructure. You may even use different rankings based upon different portions of the network infrastructure—for example, when testing data services, threats to data integrity may be important, compared to voice services, where threats that negatively impact availability may be critical.
In Table 4, any vulnerability with a threat index of H1, H2, M1, M2, and L1 requires immediate attention.
Related Posts with Thumbnails

Link Exchange