Codec2 — an Open Source, Low-Bandwidth Voice Codec

← Back to Stories (view on slashdot.org)

Codec2 — an Open Source, Low-Bandwidth Voice Codec

Posted by Soulskill on Monday September 20, 2010 @06:09PM from the sound-of-efficiency dept.

Bruce Perens writes "Codec2 is an Open Source digital voice codec for low-bandwidth applications, in its first Alpha release. Currently it can encode 3.75 seconds of clear speech in 1050 bytes, and there are opportunities to code in additional compression that will further reduce its bandwidth. The main developer is David Rowe, who also worked on Speex. Originally designed for Amateur Radio, both via sound-card software modems on HF radio and as an alternative to the proprietary voice codec presently used in D-STAR, the codec is probably also useful for telephony at a fraction of current bandwidths. The algorithm is based on papers from the 1980s, and is intended to be unencumbered by valid unexpired patent claims. The license is LGPL2. The project is seeking developers for testing in applications, algorithmic improvement, conversion to fixed-point, and coding to be more suitable for embedded systems."

15 of 179 comments (clear)

Min score:

Reason:

Sort:

what about LATENCY? by Kristopeit,MichaelDa · 2010-09-20 18:25 · Score: 4, Interesting

why is seemingly the most important aspect of communication technology so often overlooked?
i assume it's acceptable... but it angers me that someone thought it was relevant to give the exact number of bytes for a seemingly arbitrary 3.5 seconds of audio, but failed to say how long it take to encode that 3.5 seconds of audio, or what average latency can be expected after buffer conditions are met.
1. Re:what about LATENCY? by Kristopeit,MichaelDa · 2010-09-20 18:58 · Score: 2, Interesting
  
  it could take 16MB/s and still function in real time over the internet for me... my problem isn't that the latency wasn't shown, it was that the bitrate WAS shown BUT the latency wasn't shown.
  also, considering the advantages of using lower bitrate voice codecs, the ability to implement the encoder and decoder algorithms directly in very low transistor count custom hardware would appeal to the same crowd... so not just latency in terms of x86 instructions per second, but the ability to implement those instructions in hardware.
  i am concerned about bruce's use of the term "real time"... either he is implying there is no noticeable latency to him, (which is irrelevant to me as numerous others claim skype video chat is "real time", and also impossible given the implicit time consuming process of encoding), or he's cleverly stating that the time it takes to encode is the real time it takes to encode. it's not the fake time. it's real time.
  again, i assume, and it seems i'm correct to do so, that the codec is "very usable"... i won't be trying it as i have no need for it.
2. Re:what about LATENCY? by Kristopeit,MichaelDa · 2010-09-20 19:21 · Score: 2, Interesting
  
  yes, of course... but "refining" a codec for hardware implementation is doing the exact opposite to the quality of the signal.
  why not refine the a DSP chip architecture until it works well with the original codec? i know masks are expensive... but why not do it all the way?
Serindipidy. by firstnevyn · 2010-09-20 18:41 · Score: 3, Interesting

As a newly licenced ham in a area where Dstar repeaters are everywhere (VK) and free software advocate I have recently become aware of the issues with Dstar and have been reading about this work so it's quite surreal to have it pop up on /. in the week where I get my licence. I havn't had a chance to read the Dstar specifications but am wondering if the voice codec is flagged in the dstar digital stream. and if it would be possible to create translating repeaters so dual output repeaters with differently coded data streams it'd take more spectrum but would also allow for a migration path (at least for repeater users?)
Re:Great news by Bruce+Perens · 2010-09-20 19:10 · Score: 4, Interesting

I think you could cut the sample rate in half and get acceptable performance, but I've not tried. Currently I think it's 25 microsecond frames, and each frame has one set of LSPs and two sets of voicing information so it's interpolated into 12.5 microsecond frames. Those lower bandwidth codecs do 50 microsecond frames. Go forth and hack upon it if you'd like to see. Also, there are some optimizations that are obvious to David and Jean-Marc (and which I barely understand) that haven't been added yet. One is that the LSPs are monotonic and nothing has been done to remove that redundancy. Delta coding or vector quantization might be ways to do that. I understand delta coding but would not be the one to do VQ. Another is that there is a lot of correlation of the LSPs between adjacent frames, so you don't necessarily have to send the entire LSP set every frame. And there is probably lots of other opportunity for compression that I have no concept of.

--
Bruce Perens.
Packet loss? by Amarantine · 2010-09-20 19:30 · Score: 3, Interesting

I didn't see it mentioned when quickly scanning TFA, but how does this codec handle packet loss?
It is all nice and well to develop a codec to cram as much speech as possible in as few bits as possible, but in this case, one lost packet could mean a gap of several seconds. The success of a low-bandwidth codec, at least when it comes to IP telephony, also depends on how well it can handle lost packets. Low bandwidth codecs are usually used in low bandwidth networks, such as the internet, and there the packetloss is the highest.
Same goes for delay and jitter, by the way. If a stream of packets is delayed, and more voice is crammed in fewer bits, then the delays in the voice stream will get longer too.
Re:Original Rationale by slimjim8094 · 2010-09-20 19:33 · Score: 2, Interesting

Looks really cool. I haven't messed around with D-STAR since I don't like the idea of being tied into a specific system (seems to contravene the point of amateur radio). I'll definitely be keeping an eye on this to see where it heads.
I had a really awesome idea just now for transmitting this at 1200bps using AFSK Bell 202 (like APRS) and hacking up live voice using entirely existing equipment (TNCs, etc). But the given example of 1050 bytes/3.75s works out by my math to 2240bps. I guess you could run it over 9600bps packet, with room to spare (text chat?)
73,
KC2YWE

--
I have developed a truly marvelous proof of this comment, which this signature is too narrow to contain.
English only ? by Yvanhoe · 2010-09-20 19:37 · Score: 4, Interesting

At such high compression rates, one could wonder if the optimizations to transmit clear speech make assumptions about the language used. Does it work well with French ? Arabic ? Chinese ?

--
The Wise adapts himself to the world. The Fool adapts the world to himself. Therefore, all progress depends on the Fool.
1. Re:English only ? by Bruce+Perens · 2010-09-20 19:56 · Score: 4, Interesting
  
  The basic assumptions are based on the mechanics of the vocal tract, and I suspect not high-level enough to differ across languages, but obviously it would be nice to hear from speakers of other languages who test it. We could also use a larger corpus of spoken samples for testing.
  
  --
  Bruce Perens.
2. Re:English only ? by Anonymous Coward · 2010-09-20 21:52 · Score: 1, Interesting
  
  One of the earlier observations in this field was that a low-bandwidth filter specifically hurt languages with hissing sounds, and I presume you'd have similar problems with click sounds.
  As the GP indicated, Chinese (Mandarin) would be an important addition as it's a tonal language, and compression should not blur that distinction. Languages such as Arabic have emphatic consonants, which aren't that common in Western languages either. But French? That's for all practical purposes identical to English. Finnish would make more sense. As for the click sounds, try Zulu. It apparently has 15 distinct click sounds, and there should be enough speakers online.
Mumble integration ? by Anonymous Coward · 2010-09-20 19:48 · Score: 4, Interesting

One of the fastest ways to ensure its testing and distribution is to use it in Mumble - the low latency voice chat software (with an iPhone client as well).
Mumble is typically used by gaming clans for their chat rooms and it Codec2 would be tested in real-life conditions.
1. Re:Mumble integration ? by Bruce+Perens · 2010-09-20 20:17 · Score: 4, Interesting
  
  One of the fastest ways to ensure its testing and distribution is to use it in Mumble - the low latency voice chat software (with an iPhone client as well). Mumble is typically used by gaming clans for their chat rooms and it Codec2 would be tested in real-life conditions.
  Is there an existing Mumble developer whom we could get interested in this? It might be that we should take some of the Alpha-isms out of the code first.
  
  --
  Bruce Perens.
Re:Original Rationale by the+way · 2010-09-20 20:48 · Score: 2, Interesting

By the way, look over his web site rowetel.com for the other work he's done: two really nice Open Hardware projects - a PBX and a mesh telephony device, an Open Source echo canceler for digital telephony, used in Asterisk and elsewhere, and his own electric car conversion.
I've got one of his little ip01 telephony boxes, and it is quite fantastic - a tiny, cheap, fanless, (embedded) Linux computer with plenty of memory and CPU grunt, and of course telephony hardware on board. It also has a package manager, with a quite a few pieces of software available, and regular firmware updates. It's much more powerful than the various Linux-based consumer routers that are available - it's a great option if you're looking for a small Linux server to run Asterisk, a little web site, DNS server, SSH, etc...
(I'm not affiliate with David or Rowetel in any way - just a happy customer, who is in awe of the amazing things this guy has achieved in such a wide variety of areas).
How does it handle background noise? by wowbagger · 2010-09-21 01:14 · Score: 2, Interesting

Bruce, have you guys done any testing of performance in the presence of background noise? I know that in the PMR area, there are a lot of firemen who are very unhappy with what happens to AMBE when their is background noise (e.g. saws, Personal Alert Safety System, fire) gets into the mike - while AMBE does ok at encoding just speech, throw the noise of a saw in the background and all you get is garbage.
While the initial application of CODEC2 is hams in their shacks with their noise-canceling mikes, It Would Be Nice If the vocoder didn't curl up its toes and die in an noisy environment.
See "Urgent Communications", September 10th edition, page 10, "Round 2 of digital radio fireground tests held", and the test plan.

--
www.eFax.com are spammers
Re:Wonderful Name by Hatta · 2010-09-21 03:47 · Score: 2, Interesting

But there it's called AMBE, not Codec. Codec2 is a bad name for a codec for the same reasons that Variable2 is a bad name for a variable. If this is supposed to supplant AMBE, why not AMBE2 or S(uper)AMBE?

--
Give me Classic Slashdot or give me death!