Slashdot Mirror


Codec2 — an Open Source, Low-Bandwidth Voice Codec

Bruce Perens writes "Codec2 is an Open Source digital voice codec for low-bandwidth applications, in its first Alpha release. Currently it can encode 3.75 seconds of clear speech in 1050 bytes, and there are opportunities to code in additional compression that will further reduce its bandwidth. The main developer is David Rowe, who also worked on Speex. Originally designed for Amateur Radio, both via sound-card software modems on HF radio and as an alternative to the proprietary voice codec presently used in D-STAR, the codec is probably also useful for telephony at a fraction of current bandwidths. The algorithm is based on papers from the 1980s, and is intended to be unencumbered by valid unexpired patent claims. The license is LGPL2. The project is seeking developers for testing in applications, algorithmic improvement, conversion to fixed-point, and coding to be more suitable for embedded systems."

50 of 179 comments (clear)

  1. Presentation this week. by Bruce+Perens · · Score: 4, Informative

    I'll be presenting on Codec2 at the ARRL/TAPR Digital Communications Conference this weekend in Vancouver Washington, Near Portland. I'll try to get the video online.

    1. Re:Presentation this week. by Anonymous Coward · · Score: 2, Funny

      But will you be presenting IN Codec2?
      That would be very impressive.

    2. Re:Presentation this week. by Bruce+Perens · · Score: 4, Informative

      I am bringing the materials for a demo table with two laptops and real-time encode-decode, so people can try it themselves.

  2. Original Rationale by Bruce+Perens · · Score: 5, Informative

    The original rationale for Codec2 is at Codec2.org. I've been promoting this issue for about four years, as I was bothered by the proprietary nature of the AMBE codec in D-STAR. But I didn't have the math, etc., to do the work myself. It was really fortunate that David became motivated to do the work without charge. He has a Ph.D. in voice coding. By the way, look over his web site rowetel.com for the other work he's done: two really nice Open Hardware projects - a PBX and a mesh telephony device, an Open Source echo canceler for digital telephony, used in Asterisk and elsewhere, and his own electric car conversion. He'd be my nomination for the MacArthur grant.

    1. Re:Original Rationale by Yaur · · Score: 4, Informative
      In a nutshell it looks like the rational for not just using Speex is:
      • better resilience to bit errors
      • better performance at ultra low bitrates
    2. Re:Original Rationale by Bananatree3 · · Score: 4, Informative

      that is basically it. Speex is built (as I understand it) for lossless transmission methods with little/no error correction needed. Radio, by its very nature is a very lossy medium, so something with better error tolerance is needed. Hence, Codec2 provides a nice route.

    3. Re:Original Rationale by slimjim8094 · · Score: 2, Interesting

      Looks really cool. I haven't messed around with D-STAR since I don't like the idea of being tied into a specific system (seems to contravene the point of amateur radio). I'll definitely be keeping an eye on this to see where it heads.

      I had a really awesome idea just now for transmitting this at 1200bps using AFSK Bell 202 (like APRS) and hacking up live voice using entirely existing equipment (TNCs, etc). But the given example of 1050 bytes/3.75s works out by my math to 2240bps. I guess you could run it over 9600bps packet, with room to spare (text chat?)

      73,
      KC2YWE

      --
      I have developed a truly marvelous proof of this comment, which this signature is too narrow to contain.
    4. Re:Original Rationale by Bruce+Perens · · Score: 4, Informative

      Sound-card modem implementations over SSB would be practical. See FDMDV. We're still a little wide for that, but we'll get there.

    5. Re:Original Rationale by adolf · · Score: 2, Informative

      (Stating the obvious for those with sufficiently low UIDs and/or those who remember VAXen, or similar, or at least those with a proper beard...)

      that is basically it. Speex is built (as I understand it) for lossless transmission methods with little/no error correction needed. Radio, by its very nature is a very lossy medium, so something with better error tolerance is needed. Hence, Codec2 provides a nice route.

      that is basically it. Speex is built (as I understand it) for lossless transmission methods with little/no error correction needed. UDP , by its very nature is a very lossy medium, so something with better error tolerance is needed. Hence, Codec2 provides a nice route.

      (There. Extrapolated that for you. Doubly-so, perhaps.)

    6. Re:Original Rationale by Bruce+Perens · · Score: 4, Informative

      How does it compare to CELT?

      So far, we've really only compared it to g.729, and it does OK against that. CELT starts at 32 kilobits per second and we're at 2 kilobits, so it's not really for the same application. But I noticed that the Alpha, all-floating-point implementation with some known low-performance code encoded the 3.75 seconds in 0.06 seconds, and decoded them in 0.04, on my 2.4 GHz processor. I would think that a polished implementation could achieve low delay on a DSP chip or some flavors of embedded CPU.

    7. Re:Original Rationale by Gordonjcp · · Score: 3, Informative

      You could just about squeeze it into 2400bps. It would probably be possible to get that out of existing AFSK modems without needing to go down the route of discriminator taps and such. Using a hardware GMSK modem like the FX589 chip would give you 9600 baud with the option of interoperating with existing D-Star modems, and interfacing an FX589 is going to be easier to implement than a G3RUH modem.

    8. Re:Original Rationale by the+way · · Score: 2, Interesting

      By the way, look over his web site rowetel.com for the other work he's done: two really nice Open Hardware projects - a PBX and a mesh telephony device, an Open Source echo canceler for digital telephony, used in Asterisk and elsewhere, and his own electric car conversion.

      I've got one of his little ip01 telephony boxes, and it is quite fantastic - a tiny, cheap, fanless, (embedded) Linux computer with plenty of memory and CPU grunt, and of course telephony hardware on board. It also has a package manager, with a quite a few pieces of software available, and regular firmware updates. It's much more powerful than the various Linux-based consumer routers that are available - it's a great option if you're looking for a small Linux server to run Asterisk, a little web site, DNS server, SSH, etc...

      (I'm not affiliate with David or Rowetel in any way - just a happy customer, who is in awe of the amazing things this guy has achieved in such a wide variety of areas).

    9. Re:Original Rationale by Yaur · · Score: 4, Informative

      With UDP the typical loss scenario is dropped packets but with radio single bit errors are more likely. This difference means that FEC strategies for one scenario are not directly applicable to the other.

      for UDP in packet FEC data is useless and your error correction scheme needs to be prepared to deal with losing a whole packets worth of data to be useful. For voice this is going to introduce too much latency so instead a typical codec might just try to interpolate the lost data. With radio on the other hand there is value to in packet error correction bits within the stream and in the event of an error you are going to have more data with which to guess what the audio should be like, especially if you know which bits are errored (or possibly errored)

    10. Re:Original Rationale by jmv · · Score: 5, Informative

      The fundamental difference is not that much the lossless vs lossy transmission, but the actual bit-rate. I designed Speex with a "sweet spot" around 16 kb/s, whereas David designed codec for a sweet spot around 2.4 kb/s. Speex does have a 2.4 kb/s mode, but the quality isn't even close to what David was able to achieve with codec2.

    11. Re:Original Rationale by wowbagger · · Score: 3, Informative

      If you've ever heard AMBE in the presence of bit errors, it doesn't do so well either. It isn't the vocoder's job to deal with bit errors, it is the protocol's job. Over half the bits in a APCO-25 voice frame are forward error correction for the voice payload: Golay encoding, Reed-Solomon, bit order scrambling (interleaving), you name it.

      Putting resistance to bit errors in the codec is the wrong place to do it.

      Now, making the codec use less bits, so the protocol layer has more bits for FEC makes sense.

    12. Re:Original Rationale by fuzzyfuzzyfungus · · Score: 2, Insightful

      Wow. Ordinarily I'm of the opinion that crying "Godwin's Law!" is a bit overused; but having someone describe the LGPL as "Nazi-like" is making me reconsider.

      Somebody goes to the trouble of designing a novel, patent unencumbered(ie. if you don't like the software licence, you are perfectly free to write your own implementation), codec that fits an otherwise rather underserved niche. They have the temerity to release it under a license requiring you to release your modifications to their code if you distribute in binary form and this is somehow analogous to a particularly virulent flavor of genocidal fascism?

      You are really messing up the BSD crowd's reputation for being ideologically mellow compared to team GPL...

    13. Re:Original Rationale by bill_mcgonigle · · Score: 2, Informative

      This works on 51-byte frames.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  3. Err Speex by Knee+Socks · · Score: 2, Informative

    Speex: Speex is based on CELP and is designed to compress voice at bitrates ranging from 2 to 44 kbps. Some of Speex's features include: Narrowband (8 kHz), wideband (16 kHz), and ultra-wideband (32 kHz) compression in the same bitstream

    --
    BLACK KNIGHT SECURITY SYSTEMS
    We'll bite your legs off
    1. Re:Err Speex by Gordonjcp · · Score: 3, Informative

      Speex isn't great in this application, because at low bitrates there is a significant delay through the codec and the output stream requires far too much bandwidth to be useful. Consider that digital speech systems like Mototrbo, TETRA, P25 and Iridium typically have less than 6kbps throughput once you've taken FEC into account.

  4. what about LATENCY? by Kristopeit,MichaelDa · · Score: 4, Interesting
    why is seemingly the most important aspect of communication technology so often overlooked?

    i assume it's acceptable... but it angers me that someone thought it was relevant to give the exact number of bytes for a seemingly arbitrary 3.5 seconds of audio, but failed to say how long it take to encode that 3.5 seconds of audio, or what average latency can be expected after buffer conditions are met.

    1. Re:what about LATENCY? by Bruce+Perens · · Score: 4, Informative

      Right. Sorry. Real time on the x86 workstation I'm using. Not converted to fixed-point for weaker CPUs yet. Not tested on ARM, Blackfin, AVR, etc. Waiting for you to do that :-) Downloadable code. Reasonably portable. Type make and let fly.

    2. Re:what about LATENCY? by Garridan · · Score: 2, Insightful

      Well, the source is right there on the webpage. Why don't you download & compile it, and see for yourself? It's an alpha release so I'll guess that it's slower than it could be.

    3. Re:what about LATENCY? by Kristopeit,MichaelDa · · Score: 2, Interesting
      it could take 16MB/s and still function in real time over the internet for me... my problem isn't that the latency wasn't shown, it was that the bitrate WAS shown BUT the latency wasn't shown.

      also, considering the advantages of using lower bitrate voice codecs, the ability to implement the encoder and decoder algorithms directly in very low transistor count custom hardware would appeal to the same crowd... so not just latency in terms of x86 instructions per second, but the ability to implement those instructions in hardware.

      i am concerned about bruce's use of the term "real time"... either he is implying there is no noticeable latency to him, (which is irrelevant to me as numerous others claim skype video chat is "real time", and also impossible given the implicit time consuming process of encoding), or he's cleverly stating that the time it takes to encode is the real time it takes to encode. it's not the fake time. it's real time.

      again, i assume, and it seems i'm correct to do so, that the codec is "very usable"... i won't be trying it as i have no need for it.

    4. Re:what about LATENCY? by Kristopeit,MichaelDa · · Score: 2, Interesting
      yes, of course... but "refining" a codec for hardware implementation is doing the exact opposite to the quality of the signal.

      why not refine the a DSP chip architecture until it works well with the original codec? i know masks are expensive... but why not do it all the way?

    5. Re:what about LATENCY? by KliX · · Score: 2, Informative

      I think he probably means it in a 'how many samples does the codec need before it can send a packet' type of latency.

    6. Re:what about LATENCY? by Bruce+Perens · · Score: 4, Informative

      There are currently 51 bits in a frame. That is the minimum that you can send, and you'd send 40 of those per second as the codec is presently implemented. A real data radio would add bandwidth for its data encapsulation, but would have to meet the time and bandwidth requirements of the codec payload.

    7. Re:what about LATENCY? by vlm · · Score: 2, Informative

      If, for some reason latency is an issue when it's first shoehorned into a DSP chip, Codec2 will be refined until it works well on a DSP chip, in real real time.

      I think you are not using the definition of latency that most in the field would use.

      Latency is how long it takes to process the data. Its a computer science type of thing. If you understand Knuth and his tape drive sorting examples, this is pretty obvious...

      For example, heres a nice, simple, hopelessly useless codec that has almost exactly 100 ms of latency:

      1) Get yerself a buffer that holds 1000 samples.
      2) Run a A/D converter at 10Ksamples/sec until the buffer is full.
      3) Run "gzip" on the 1000 sample buffer, squishing it down to maybe 500 bytes. Optimistically.
      4) send the 500 byte chunk to the other side (radio, internet, whatever)
      5) Run "gunzip" on the hopefully unerrored compressed 500 bytes, expanding it back to 1000 raw samples.
      6) Squirt yonder 1000 raw sample values out the D/A converter at 10Ksamples/sec
      7) Pray ye get another packet of compressed voice data before the well runs dry. Or maybe listen to a bit of silence. Or play interpolation games.

      Your argument is once steps 3 and 5 are quick enough, the codec latency will be zero. A fine bit of analysis, however, shows that the first sample to enter the buffer in step 2 cannot possibly be decompressed and played back, until the buffer fills, which takes... 100 ms aka 1/10 of a second. This is the latency we're talking about in voice codecs. Most are somewhat faster than 1/10 of a second.

      It is quite possible to make a very efficient codec where a fireman would hit the PTT button, yack into the radio for a fifteen minutes, and then the entire message would be compressed down to maybe 1000 bits/sec theoretical average, then sent and played back. This would, of course, be completely useless for tactical public safety comms.

      --
      "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
  5. Re:Interactive communication? by Bruce+Perens · · Score: 3, Informative

    It is a real-time codec on my workstation and is intended to be a real-time codec on embedded DSP. It's currently all floating point and does things it should not like malloc of multiple buffers per sample

    Download the code and build it. It's "just type make" on Linux. The raw (uncompressed) sample format we've used for testing is 16-bit samples at 8 KHz and there are some tools to play those, and some pre-recorded samples. Not too much trouble to figure out.

  6. Serindipidy. by firstnevyn · · Score: 3, Interesting

    As a newly licenced ham in a area where Dstar repeaters are everywhere (VK) and free software advocate I have recently become aware of the issues with Dstar and have been reading about this work so it's quite surreal to have it pop up on /. in the week where I get my licence. I havn't had a chance to read the Dstar specifications but am wondering if the voice codec is flagged in the dstar digital stream. and if it would be possible to create translating repeaters so dual output repeaters with differently coded data streams it'd take more spectrum but would also allow for a migration path (at least for repeater users?)

    1. Re:Serindipidy. by Bruce+Perens · · Score: 4, Informative

      Congratulations on the license, OM. We haven't yet explored how to wedge this into D-STAR, but sending it as data rather than voice would be one way. All of the D-STAR radios except the latest one, the IC-92AD, use a plug-in daughter board to hold the AMBE chip, and it might be that somebody could make a dual-chip version of this board sometime. Since AMBE is proprietary we are stuck using their chip if we want to be compatible, unless the repeater does the conversion for us using a DV-Dongle. They sell TI DSP chips with their program burned in, and don't give out the algorithm.

      It may be that on D-STAR the AMBE chip also does the modulation for a data transmission, just doesn't run the codec. But the modulation is known and there is a sound-card software implementation of D-STAR that interoperates with it. I don't have any D-STAR equipment to test. The folks on dstar_development@yahoogroups.com know a lot more about D-STAR.

      73
      K6BP

    2. Re:Serindipidy. by __aajfby9338 · · Score: 4, Insightful

      Congratulations on your new license!

      The proprietary AMBE codec bothers me, too. I think that a closed, license-encumbered, proprietary codec is entirely inappropriate for ham radio use.

    3. Re:Serindipidy. by Bruce+Perens · · Score: 4, Informative

      The repeater can rebroadcast the data, but that data would be AMBE encoded, and AMBE is both trade-secret in its implementation and patented in some of its algorithms. There may be an AMBE chip in the repeater, I've not played with one. The usual way one converts to and from AMBE on a PC is with a device called the DV-Dongle, which contains the AMBE chip. This costs lots of money and is not nearly so powerful as the CPU of the computer it's plugged into, which is one reason to be fed up with proprietary codecs.

      So, if you had some newer, Codec2-based radios, and some older D-STAR radios, linking repeaters might be a good way to get them to talk to each other.

      This is hand-waving about a lot of issues, like we've not designed the next generation of data radio to put Codec2 into. One might guess that such a thing could use IPV6, and better modulation than just FM, and FEC, etc.

  7. Speex developers are involved by Bruce+Perens · · Score: 3, Informative

    Jean-Marc Valin is on the project mailing list and David is another Speex developer and the person Jean-Marc recommended to me. We are trying for an improvement over Speex at low rates.

  8. Great news by Anonymous Coward · · Score: 2, Informative

    >3.75 seconds of clear speech in 1050 bytes

    That's 2240 bps, 2.19 kbps, quite impressive. Maybe one day they can beat MELP (up to 600bps) and remain open.

    Excellent work.

    1. Re:Great news by Bruce+Perens · · Score: 4, Interesting

      I think you could cut the sample rate in half and get acceptable performance, but I've not tried. Currently I think it's 25 microsecond frames, and each frame has one set of LSPs and two sets of voicing information so it's interpolated into 12.5 microsecond frames. Those lower bandwidth codecs do 50 microsecond frames. Go forth and hack upon it if you'd like to see. Also, there are some optimizations that are obvious to David and Jean-Marc (and which I barely understand) that haven't been added yet. One is that the LSPs are monotonic and nothing has been done to remove that redundancy. Delta coding or vector quantization might be ways to do that. I understand delta coding but would not be the one to do VQ. Another is that there is a lot of correlation of the LSPs between adjacent frames, so you don't necessarily have to send the entire LSP set every frame. And there is probably lots of other opportunity for compression that I have no concept of.

  9. Re:Thankyou! by Bananatree3 · · Score: 3, Informative

    you'll be happy to know that it's a fellow Australian ham developing this Codec2 - David Rowe, VK5DGR Here's a link to David's development page

  10. Packet loss? by Amarantine · · Score: 3, Interesting

    I didn't see it mentioned when quickly scanning TFA, but how does this codec handle packet loss?

    It is all nice and well to develop a codec to cram as much speech as possible in as few bits as possible, but in this case, one lost packet could mean a gap of several seconds. The success of a low-bandwidth codec, at least when it comes to IP telephony, also depends on how well it can handle lost packets. Low bandwidth codecs are usually used in low bandwidth networks, such as the internet, and there the packetloss is the highest.

    Same goes for delay and jitter, by the way. If a stream of packets is delayed, and more voice is crammed in fewer bits, then the delays in the voice stream will get longer too.

    1. Re:Packet loss? by Bruce+Perens · · Score: 4, Informative

      We don't know yet, but I don't see how it could be worse than AMBE in D-STAR, which makes various eructions when faced with large packet loss. I did various sorts of bit-error injection inadvertently while debugging yesterday, and right now you still get comprehensible voice with significant corruption of the LSP data. This, IMO, indicates an opportunity for more compression. Handling the problems of the radio link is more a problem for forward error correction, etc.

  11. Really early latency figures by Bruce+Perens · · Score: 4, Informative

    It encoded those 3.75 seconds in 0.06 seconds and decoded in 0.04 seconds on my AMD Phenom 9750 2.4 GHz, one core only, compiled with GCC and the -O3 switch. That's all of the overhead of the program starting and exiting, too. It's using floating, not fixed point.

    This, it seems, bodes well for low latency of the final implementation on a DSP chip.

    1. Re:Really early latency figures by Bruce+Perens · · Score: 3, Informative

      No significant state between frames so far. 25 miliseconds per frame. That is the minimum delay before the other side starts to hear the audio.

  12. English only ? by Yvanhoe · · Score: 4, Interesting

    At such high compression rates, one could wonder if the optimizations to transmit clear speech make assumptions about the language used. Does it work well with French ? Arabic ? Chinese ?

    --
    The Wise adapts himself to the world. The Fool adapts the world to himself. Therefore, all progress depends on the Fool.
    1. Re:English only ? by Bruce+Perens · · Score: 4, Interesting

      The basic assumptions are based on the mechanics of the vocal tract, and I suspect not high-level enough to differ across languages, but obviously it would be nice to hear from speakers of other languages who test it. We could also use a larger corpus of spoken samples for testing.

  13. Mumble integration ? by Anonymous Coward · · Score: 4, Interesting

    One of the fastest ways to ensure its testing and distribution is to use it in Mumble - the low latency voice chat software (with an iPhone client as well).
    Mumble is typically used by gaming clans for their chat rooms and it Codec2 would be tested in real-life conditions.

    1. Re:Mumble integration ? by Bruce+Perens · · Score: 4, Interesting

      One of the fastest ways to ensure its testing and distribution is to use it in Mumble - the low latency voice chat software (with an iPhone client as well). Mumble is typically used by gaming clans for their chat rooms and it Codec2 would be tested in real-life conditions.

      Is there an existing Mumble developer whom we could get interested in this? It might be that we should take some of the Alpha-isms out of the code first.

    2. Re:Mumble integration ? by Inda · · Score: 5, Insightful

      Is this really Slashdot? Do I have a DNS error?

      These are the stories I used to enjoy. I don't realy understand them, but they make a good read.

      --
      This post contains benzene, nitrosamines, formaldehyde and hydrogen cyanide.
  14. Re:Impressive. by lobiusmoop · · Score: 3, Informative

    The DSP Innovations codec manages decent speech quality at 600bps, god knows how (proprietary closed source). I think this the state-of-the-art in low bitrate codecs just now.

    --
    "I bless every day that I continue to live, for every day is pure profit."
  15. How does it handle background noise? by wowbagger · · Score: 2, Interesting

    Bruce, have you guys done any testing of performance in the presence of background noise? I know that in the PMR area, there are a lot of firemen who are very unhappy with what happens to AMBE when their is background noise (e.g. saws, Personal Alert Safety System, fire) gets into the mike - while AMBE does ok at encoding just speech, throw the noise of a saw in the background and all you get is garbage.

    While the initial application of CODEC2 is hams in their shacks with their noise-canceling mikes, It Would Be Nice If the vocoder didn't curl up its toes and die in an noisy environment.

    See "Urgent Communications", September 10th edition, page 10, "Round 2 of digital radio fireground tests held", and the test plan.

  16. Jump on quick - this could be the next twitter! by sootman · · Score: 2, Funny

    Who wants to be the first to make a web service based on this codec and 3.75-second messages? :-)

    --
    Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
  17. Its a great start but not usable yet! by briankwest · · Score: 3, Informative

    I have been working on mod_codec2.c for FreeSWITCH, which is committed in a WIP module. The library for codec2 isn't a library at all just yet. I'm working with David and Bruce to make sure we can get a working libcodec2 in place ASAP so we have a real VoIP demo that people can compile, call and test against. /b

  18. Re:Wonderful Name by Hatta · · Score: 2, Interesting

    But there it's called AMBE, not Codec. Codec2 is a bad name for a codec for the same reasons that Variable2 is a bad name for a variable. If this is supposed to supplant AMBE, why not AMBE2 or S(uper)AMBE?

    --
    Give me Classic Slashdot or give me death!