Hiding Messages In VoIP Packets

← Back to Stories (view on slashdot.org)

Hiding Messages In VoIP Packets

Posted by samzenpus on Wednesday November 16, 2011 @12:21PM from the obscure-message dept.

Orome1 writes "A group of researchers from the Warsaw University of Technology have devised a relatively simple way of hiding information within VoIP packets exchanged during a phone conversation. The called the method TranSteg, and they have proved its effectiveness by creating a proof-of-concept implementation that allowed them to send 2.2MB (in each direction) during a 9-minute call. IP telephony allows users to make phone calls through data networks that use an IP protocol. The actual conversation consists of two audio streams, and the Real-Time Transport Protocol (RTP) is used to transport the voice data required for the communication to succeed. But, RTP can transport different kinds of data, and the TranSteg method takes advantage of this fact."

6 of 83 comments (clear)

Min score:

Reason:

Sort:

Would this really work? by BenGL · 2011-11-16 12:41 · Score: 5, Interesting

From what I understand, steganography works if an observer (Carl) cannot tell that transmission of covert data is taking place between Alice and Bob. The proposed method results in an RTP bitstream that does not hold the payload advertised in its headers -- the audio is compressed using a more efficient codec than advertised in the packet headers, and the extra space is used to carry the "hidden" payload; Alice and Bob agree beforehand on the audio codec to use.
Now if Carl wants to eavesdrop on the conversation by hijacking (or owning) an intermediary network node, he would get corrupted audio data when trying to decode the packets with the (fake) advertised codec. Wouldn't this be a strong indication that covert communication is taking place?
1. Re:Would this really work? by wierd_w · 2011-11-16 15:01 · Score: 4, Interesting
  
  The better approach would be to preprocess the audio signal of the conversation through another device (such as the handset itself) which normalizes the audio in a fashion tailored to the advertised codec. The idea being that the resulting bitsream will obey certain predictable rules. (You need to have very detailed knowledge of the codec used, but that shouldn't be seen as a barrier.) Your steganographic payload makes subtle, but permitted changes to the encoded audio data to disrupt this predictable ruleset. Your message is thus folded into the bitstream using the mathematically freed bandwidth of the "noisy" audio channel. (Once you remove the normal audio signal, the difference bits are the secret message.) To the interceptor, the codec uses the correct bandwidth, uses the correct codec, and is easily played by that codec.
  For a simplified example, say we have gzip'ed pcm audio, in the 44100khz,16bit,stereo flavor. The preprocessor makes all the pcm samples an even multiple of 2. This frees up a portion of the channel for data, by having an understood second codec that encodes say, RLL data into a series of single bit additions to the samples (making them odd values instead of even ones.)
  The pcm decoder will play the steganographed audio file without any noticable signal (single bit manipulations are too small to be detected by human ears). The secret message codec looks at all the samples, records a bit pattern of even or odd, and then decodes the resulting RLL pattern, recovering the message.
  More sophisticated codecs would require more sophisticated preprocessing of the raw audio, but the idea is still potentially employable.
Speaking of which... by ADRA · 2011-11-16 12:45 · Score: 4, Interesting

I was thinking that a way of sending hidden messages between two locations (assuming a reasonably reliable network), one could introduce send messages by controlling the rate of the replies in a predictable manner (using ECC and varying transition timings for error rate compensation).
Another simple one would be with TCP/UDP in forcing out of order packets for positive/negative bit representation and similar correction routines as above.
Both hidden message systems are slow to send any substantial amount of information, but I can't see a reasonable approach to intercept without a full dump of the entire packets and timestamps which is more laborious than just the session data contents (assuming one is ManInTheMiddle). Further security on the payload as necessary, but the transmission of the message itself is hard detect.

--
Bye!
It would work... but there are better ways by Anonymous Coward · 2011-11-16 13:07 · Score: 4, Interesting

Most used codecs use some internal ECC, so filling RTP packets with your data will be easily recognized.
Another approach would be doing FFT on decoded audio. Codecs tend to produce wideband noise with random data and that is very different from usual speech frequency response.
Much better method would be using LSB bits in codec to transfer message. It would result in slight differences in pitch or other parameters, but it would be almost undetectable.
Re:A sad necessity by betterunixthanunix · 2011-11-16 13:20 · Score: 5, Interesting

Steganography is already widely used by the movie industry. Movies sent to movie theaters have robust watermarks hidden in them, which helps the MPAA identify the theaters where unauthorized recordings of movies are being made. Steganography is also used in laser printers, to help the FBI identify the origin of printed documents.

Like cryptography, steganography is not just limited to keeping your information private or to fighting censorship.

--
Palm trees and 8
Re:A sad necessity by EdIII · 2011-11-16 14:01 · Score: 5, Interesting

Except this is not steganography. Not exactly. It is a lot more complicated and highly unlikely to work.
RTP streams can carry multiple data streams. That's how voice and audio can be sent in the same connection. The summary implies that additional RTP streams are added, which is not steganographic at all. The additional streams are easily detected. It is as much steganographic as alternate data streams are in Windows files.
However, reading the article indicates something completely different from the summary. This method is not taking advantage of alternate/additional RTP streams at all. It is choosing different codecs based on a complex mapping pattern known only to the sender and receiver. The difference must allow the newly compressed, and transcoded, stream to contain extra hidden data without altering the expected size.
1) Not all VOIP systems use different codecs. It is not really required. My own systems use g729 exclusively from the handsets/deskphones/softphones all the way to termination and origination providers. Without a robust codec library the number of variations here is pretty low. Not to mention both sides would have to support it.
2) This assumes the RTP traffic is encrypted. Which means you are only using steganography as an additional layer of security.
3) If the RTP traffic is in plain text.... this makes it that much easier to defeat. If you were expecting a jpeg file, but upon inspection, found a bmp file, would you not suspect something? This method seems to rely on saying you are using one codec but choose another one. That would seem to be trivial to verify as a 3rd party intercepting packets.
The whole idea is not very workable since the value of codecs is their ability to preserve audio quality, work around iffy connections, and achieve a smaller transmission footprint.