Is Security Holding VoIP Back?
phoneboy writes "Voxilla is running a piece I wrote on security issues present in Voice over IP. While an increasing number of people are ditching their ILEC in favor of using Voice over IP from companies like Vonage, VoicePulse, Packet8, and Broadvox Direct, there are a number of potential security issues to be aware of. Is VoIP secure enough to replace the PSTN as we know it?"
Today's Firewalls dynamically open and close multiple ports as required by VoIP signaling protocols such as SIP, they remain ineffective in securely supporting unsolicited incoming connections. NAT prevents two way voice and multimedia communication, because the private addresses and ports inserted by the client devices (SIP phones, video conferencing etc.) in the packet payload are unable to be routed in public networks. Therefore, incoming calls that are in any service intended to replace the PSTN just are not possible with todays existing NAT/Firewalls.
Voice over IP actually creates some particularly hairy security problems that traditional approaches really, really don't manage well. Some disclosure: I work for Avaya, one of the big vendors of large scale VoIP systems, though much more for the enterprise market than for anything to do with the public space (Vonage, Packet8, etc).
Lets start by looking at the wire protocols. We have two separate domains within which VoIP operates: Signaling, which determines where a call should route, and traffic, which is the actual stream of speech that needs to arrive at its destination in under a tenth of a second. These are very different protocols. Signaling was originally implemented using H.323, which can be basically thought of as a port of the existing telephony protocols (SS7) to IP.
H.323 is...well...not entertaining to work with. It's a very messy protocol. To a first level of approximation, H.323 is being reimplemented with SIP, which applies the semantics of HTTP to VoIP signaling. SIP is still complicated, but in a more manageable way.
Whether one is using H.323 or SIP to route calls, the actual traffic is moved over a relatively simple protocol entitled RTP. RTP basically involves chunking compressed audio into small packets, attaching a timestamp and a codec identifier, and throwing the packet at the appropriate host. UDP Port selection is managed dynamically by whatever signaling protocol is being used, meaning a firewall either needs to open the entire range of ports that VoIP might use (not small) or it needs to directly parse the signaling traffic to determine what ports to open.
Remember how both SIP and H.323 are both very complex protocols? Add in that complex protocols can hide many security vulnerabilities, and put that complexity in the firewall: Mistakes are made. (That's not theoretical -- a recent mass audit of H.323 exposed holes not merely in VoIP endpoints, but VoIP-aware firewalls. Microsoft, who actually has a pretty impressive firewall solution, was hit pretty bad.)
It's now that we can start discussing the differences between Enterprise VoIP and the kind of PSTN-Bridge VoIP that Vonage sells. Phones in enterprises receive connections from every other potential phone -- in other words, there's generally no central proxy that copies all the traffic towards where it needs to be. In the enterprise world, there's relatively few firewalls inside the corporate network, those that are deployed can be made VoIP aware, and the "central gatekeepers" really only manage directory services (go to this IP for this extension), conference-call mixing, and in the Avaya case, encryption keys.
You don't have that situation in the public realm. Firewalls -- which are everywhere, as deployed through NAT -- simply won't accept incoming connections from hosts that a backend client wasn't communicating with in the first place. But that's almost OK, because the only host a Vonage box needs to communicate with is Vonage itself. So if you actually examine the Motorola device that Vonage is presently deploying, you'll see that it itself accepts almost no incoming connectivity of any form that doesn't appear to come from Vonage itself (just DHCP and ARP, basically). The public providers basically proxy all traffic, because they have to: Nodes on the public PSTN network (normal phone lines) can't be told to just send IP packets at the Motorola device. So the proxying is basically mandatory.
It's ironic that, at least at the moment, PSTN integration carries with it an architecture that's infinitely more wiretap-friendly than what VoIP could eventually become. Tapping a complex mesh where any node often communicates with every other node is difficult-to-impossible to do, at least with any form of reliability. Create a finite number of junction points that must be passed through in order for connectivity to be established, however, and tapping becomes feasible.
AOL Instant Messenger is the most interesting va