The Hyper Text Transfer Protocol (HTTP) is one of the most well known, and well used, protocols on the Internet. It is the protocol by which Web pages are transmitted from Web servers to clients, but it is also used by many other applications to send data between computers. For example, many peer-to-peer clients make use of the solid structure of HTTP to transfer data segments of shared files between peers. HTTP can be used to transmit both ASCII and binary data between computers.
HTTP is commonly used in the VoIP community as a way for administrators to remotely administer and configure devices. Many network management devices offer a Web-based administration panel by which the device can be altered and configured for a particular environment. Many such devices also require user authentication to be able to fully access the configuration data.
HTTP was first described in RFC 1945 at HTTP 1.0 by its founder, Tim Berners-Lee. Currently, RFC 2616 is used to describe the HTTP 1.1 protocol; however, various other RFCs describe additional extensions and uses for the HTTP protocol. These include HTTP Authentication (RFC 2617), Secure HTTP (RFC 2660), and CGI (RFC 3875).
HTTP Protocol
The function of HTTP and its protocol was designed to be very straightforward and usable by many applications. When a client wishes to request a file from an HTTP server, it simply creates a TCP session with the server and transmits a GET command with the name of the requested file and the HTTP protocol version (for example, GET /index.html HTTP/1.1). The HTTP server then responds back with the appropriate data. The response from the server will be either the data requested by the client, or an error message describing why it cannot send the data. All of the commands within the HTTP protocol are sent in regular ASCII text, with each line followed by a carriage return/line feed (CR/LF). In network logs, the CR/LF appear as hexadecimal 0x0D0A.
HTTP Client Request
For a client to retrieve data from an HTTP server, it must know the exact filename and location to construct an appropriate file request. For most purposes, this information is supplied in the form of a uniform resource locator (URL), which specifies a particular HTTP server, directory path, and file name (for example, www.digg.com/faq/index.php). When a client wishes to view this specific page, index.php, it must first make a connection to www.digg.com. This is performed by resolving the domain name to an IP through DNS, which results in the IP address of 64.191.203.30. The client then initiates a TCP connection to 64.191.203.30 and makes a request of GET /faq/index.php HTTP/1.1. This request also includes other information about the client, some of which may be required for HTTP 1.1, such as the host value. An example of a full HTTP GET request is shown next:
GET /download.html HTTP/1.1 Host: www.ethereal.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113 Accept: \ text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain; \ q=0.8,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip, deflate Accept-Charset: ISO-8859–1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://www.ethereal.com/development.html
HTTP Server Response
Upon receiving a GET request from a client, a server first ensures that the file requested does exist. If it does, the data is then sent back to the requesting client. If not, an error message is sent. Regardless of the action, a specific server response is sent back to the client that includes a status code. This status code informs the client of the response type. The most common is a 200 code, which informs the client that the file was found and will be sent. It is transmitted in the form of HTTP/1.1 200 OK, which specifies the HTTP protocol version, the status code, and a brief description of the code. Other common status codes include “404 Not Found,” which indicates that the requested file could not be located by the server, and “500 Internet Server Error” which indicates that there is a problem with the HTTP server. The following is an example of an HTTP response:
HTTP/1.1 200 OK Date: Thu, 13 May 2004 10:17:12 GMT Server: Apache Last-Modified: Tue, 20 Apr 2004 13:17:00 GMT Accept-Ranges: bytes Content-Length: 18070 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: t ext/html; charset ++ ISO-8859–1
Security Implications for HTTP
Due to the simple design of HTTP, and the early state of the Internet when it was unveiled, security wasn’t a high priority in the protocol. All data sent through HTTP was sent as clear text, which allowed any person to be able to sniff the traffic flowing across the wire and parse out sensitive data, such as usernames, passwords, and network configuration data. This is particularly dangerous since many VoIP and network management devices use HTTP as a means to allow administrators to check the status of the device and to configure additional settings. A person with malicious intent on the same network segment as the device could pick out various usernames and passwords that may work on additional computers or devices.
HTTP also supports multiple forms of authentication, which is a means by which the HTTP server can verify a user’s identity. The two authentication forms currently used are basic and digest authentications. When a server supports authentication, it sends a 401 “Authentication Required” response to clients that request sensitive data. This response will also include a “realm” (a name associated with the Web site) that notifies the user what they are accessing. When a client receives such a response, it will provide a log-in window to the user to input a valid user name and password. These values will then be transmitted back to the requesting server for verification. Because of HTTP’s design, though, these credentials will have to be constantly transmitted to the server for every further data transmission. Each of these transactions will transmit the user name and password in the clear.
Another form of authentication supported by modern HTTP clients and servers is digest authentication, which is described in depth in RFC 2617. Digest authentication has an advantage over basic authentication in that it does not send a clear password over the network. Instead, an MD5 (Message Digest) value of the password is transmitted to the requesting server. The server then uses this digest value for password comparisons. However, digest authentication is not fully supported in many older Web browsers. It also does not fully protect a user’s credentials. The user name and other information about the user are still transmitted in the clear. And, even though the password is obfuscated, a skilled, malicious user can still capture the MD5 value and use it for future transactions with that particular server to use another person’s account.
Many devices have recently provided support for HTTPS to overcome the openness of the HTTP protocol. HTTPS is a modification of HTTP wherein all data between a client and server are encrypted using the Secure Sockets Layers (SSL). In order for HTTPS to function, both the server and the client must be able to support it, and it must be specifically chosen as the form of communication in the URL. For example, instead of http://www.foo.com, a secure connection would use https://www.foo.com.