Codec2 — an Open Source, Low-Bandwidth Voice Codec
Bruce Perens writes "Codec2 is an Open Source digital voice codec for low-bandwidth applications, in its first Alpha release. Currently it can encode 3.75 seconds of clear speech in 1050 bytes, and there are opportunities to code in additional compression that will further reduce its bandwidth. The main developer is David Rowe, who also worked on Speex. Originally designed for Amateur Radio, both via sound-card software modems on HF radio and as an alternative to the proprietary voice codec presently used in D-STAR, the codec is probably also useful for telephony at a fraction of current bandwidths. The algorithm is based on papers from the 1980s, and is intended to be unencumbered by valid unexpired patent claims. The license is LGPL2. The project is seeking developers for testing in applications, algorithmic improvement, conversion to fixed-point, and coding to be more suitable
for embedded systems."
I'll be presenting on Codec2 at the ARRL/TAPR Digital Communications Conference this weekend in Vancouver Washington, Near Portland. I'll try to get the video online.
Bruce Perens.
The original rationale for Codec2 is at Codec2.org. I've been promoting this issue for about four years, as I was bothered by the proprietary nature of the AMBE codec in D-STAR. But I didn't have the math, etc., to do the work myself. It was really fortunate that David became motivated to do the work without charge. He has a Ph.D. in voice coding. By the way, look over his web site rowetel.com for the other work he's done: two really nice Open Hardware projects - a PBX and a mesh telephony device, an Open Source echo canceler for digital telephony, used in Asterisk and elsewhere, and his own electric car conversion. He'd be my nomination for the MacArthur grant.
Bruce Perens.
i assume it's acceptable... but it angers me that someone thought it was relevant to give the exact number of bytes for a seemingly arbitrary 3.5 seconds of audio, but failed to say how long it take to encode that 3.5 seconds of audio, or what average latency can be expected after buffer conditions are met.
It is a real-time codec on my workstation and is intended to be a real-time codec on embedded DSP. It's currently all floating point and does things it should not like malloc of multiple buffers per sample
Download the code and build it. It's "just type make" on Linux. The raw (uncompressed) sample format we've used for testing is 16-bit samples at 8 KHz and there are some tools to play those, and some pre-recorded samples. Not too much trouble to figure out.
Bruce Perens.
As a newly licenced ham in a area where Dstar repeaters are everywhere (VK) and free software advocate I have recently become aware of the issues with Dstar and have been reading about this work so it's quite surreal to have it pop up on /. in the week where I get my licence.
I havn't had a chance to read the Dstar specifications but am wondering if the voice codec is flagged in the dstar digital stream. and if it would be possible to create translating repeaters
so dual output repeaters with differently coded data streams it'd take more spectrum but would also allow for a migration path (at least for repeater users?)
Jean-Marc Valin is on the project mailing list and David is another Speex developer and the person Jean-Marc recommended to me. We are trying for an improvement over Speex at low rates.
Bruce Perens.
Speex isn't great in this application, because at low bitrates there is a significant delay through the codec and the output stream requires far too much bandwidth to be useful. Consider that digital speech systems like Mototrbo, TETRA, P25 and Iridium typically have less than 6kbps throughput once you've taken FEC into account.
I think you could cut the sample rate in half and get acceptable performance, but I've not tried. Currently I think it's 25 microsecond frames, and each frame has one set of LSPs and two sets of voicing information so it's interpolated into 12.5 microsecond frames. Those lower bandwidth codecs do 50 microsecond frames. Go forth and hack upon it if you'd like to see. Also, there are some optimizations that are obvious to David and Jean-Marc (and which I barely understand) that haven't been added yet. One is that the LSPs are monotonic and nothing has been done to remove that redundancy. Delta coding or vector quantization might be ways to do that. I understand delta coding but would not be the one to do VQ. Another is that there is a lot of correlation of the LSPs between adjacent frames, so you don't necessarily have to send the entire LSP set every frame. And there is probably lots of other opportunity for compression that I have no concept of.
Bruce Perens.
you'll be happy to know that it's a fellow Australian ham developing this Codec2 - David Rowe, VK5DGR Here's a link to David's development page
I didn't see it mentioned when quickly scanning TFA, but how does this codec handle packet loss?
It is all nice and well to develop a codec to cram as much speech as possible in as few bits as possible, but in this case, one lost packet could mean a gap of several seconds. The success of a low-bandwidth codec, at least when it comes to IP telephony, also depends on how well it can handle lost packets. Low bandwidth codecs are usually used in low bandwidth networks, such as the internet, and there the packetloss is the highest.
Same goes for delay and jitter, by the way. If a stream of packets is delayed, and more voice is crammed in fewer bits, then the delays in the voice stream will get longer too.
It encoded those 3.75 seconds in 0.06 seconds and decoded in 0.04 seconds on my AMD Phenom 9750 2.4 GHz, one core only, compiled with GCC and the -O3 switch. That's all of the overhead of the program starting and exiting, too. It's using floating, not fixed point.
This, it seems, bodes well for low latency of the final implementation on a DSP chip.
Bruce Perens.
At such high compression rates, one could wonder if the optimizations to transmit clear speech make assumptions about the language used. Does it work well with French ? Arabic ? Chinese ?
The Wise adapts himself to the world. The Fool adapts the world to himself. Therefore, all progress depends on the Fool.
One of the fastest ways to ensure its testing and distribution is to use it in Mumble - the low latency voice chat software (with an iPhone client as well).
Mumble is typically used by gaming clans for their chat rooms and it Codec2 would be tested in real-life conditions.
The DSP Innovations codec manages decent speech quality at 600bps, god knows how (proprietary closed source). I think this the state-of-the-art in low bitrate codecs just now.
"I bless every day that I continue to live, for every day is pure profit."
I have been working on mod_codec2.c for FreeSWITCH, which is committed in a WIP module. The library for codec2 isn't a library at all just yet. I'm working with David and Bruce to make sure we can get a working libcodec2 in place ASAP so we have a real VoIP demo that people can compile, call and test against. /b