Slashdot Mirror


Codec2 — an Open Source, Low-Bandwidth Voice Codec

Bruce Perens writes "Codec2 is an Open Source digital voice codec for low-bandwidth applications, in its first Alpha release. Currently it can encode 3.75 seconds of clear speech in 1050 bytes, and there are opportunities to code in additional compression that will further reduce its bandwidth. The main developer is David Rowe, who also worked on Speex. Originally designed for Amateur Radio, both via sound-card software modems on HF radio and as an alternative to the proprietary voice codec presently used in D-STAR, the codec is probably also useful for telephony at a fraction of current bandwidths. The algorithm is based on papers from the 1980s, and is intended to be unencumbered by valid unexpired patent claims. The license is LGPL2. The project is seeking developers for testing in applications, algorithmic improvement, conversion to fixed-point, and coding to be more suitable for embedded systems."

179 comments

  1. Presentation this week. by Bruce+Perens · · Score: 4, Informative

    I'll be presenting on Codec2 at the ARRL/TAPR Digital Communications Conference this weekend in Vancouver Washington, Near Portland. I'll try to get the video online.

    1. Re:Presentation this week. by shriphani · · Score: 1

      Please do. This looks very nice.

    2. Re:Presentation this week. by Anonymous Coward · · Score: 2, Funny

      But will you be presenting IN Codec2?
      That would be very impressive.

    3. Re:Presentation this week. by Bruce+Perens · · Score: 4, Informative

      I am bringing the materials for a demo table with two laptops and real-time encode-decode, so people can try it themselves.

    4. Re:Presentation this week. by commodore64_love · · Score: 1

      >>>But will you be presenting IN Codec2?

      But staticy. By my quick calculation it's only 2.4 kbps encoding. Like listening to voice over a 2400 baud modem.

      --
      "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
    5. Re:Presentation this week. by Anonymous Coward · · Score: 0

      I'll be presenting on Codec2 at the ARRL/TAPR Digital Communications Conference this weekend in Vancouver Washington, Near Portland. I'll try to get the video online.

      Why LGPL2 and not something like MIT/X11?

    6. Re:Presentation this week. by Anonymous Coward · · Score: 0

      Looking forward to the presentation -- the audio samples were promising.

    7. Re:Presentation this week. by Nate+B. · · Score: 1

      Why not LGPL2? You still get the ability of having any code under any license linking to the codec but contributions to the codec must be released under the LGPL2. Seems like a very smart (and best) way to do it to me.

      --

      "Insanity is doing the same thing over again expecting a different result."
    8. Re:Presentation this week. by jesset77 · · Score: 1

      >>>But will you be presenting IN Codec2?

      But staticy. By my quick calculation it's only 2.4 kbps encoding. Like listening to voice over a 2400 baud modem.

      By *my* calculation it is 2.24kbps encoding, and the entire point is being able to deliver higher quality speech over a bottleneck the size of a 2400 baud modem than has been available up to this point.

      But to be fair, 2.24 sounds like just the lowest setting available, not necessarily the most optimal setting. Jpeg is great, but Jpeg at 99% compression, not so much. :P

      --
      People willing to trade their freedom of expression for temporary entertainment deserve neither and will lose both.
  2. Original Rationale by Bruce+Perens · · Score: 5, Informative

    The original rationale for Codec2 is at Codec2.org. I've been promoting this issue for about four years, as I was bothered by the proprietary nature of the AMBE codec in D-STAR. But I didn't have the math, etc., to do the work myself. It was really fortunate that David became motivated to do the work without charge. He has a Ph.D. in voice coding. By the way, look over his web site rowetel.com for the other work he's done: two really nice Open Hardware projects - a PBX and a mesh telephony device, an Open Source echo canceler for digital telephony, used in Asterisk and elsewhere, and his own electric car conversion. He'd be my nomination for the MacArthur grant.

    1. Re:Original Rationale by Yaur · · Score: 4, Informative
      In a nutshell it looks like the rational for not just using Speex is:
      • better resilience to bit errors
      • better performance at ultra low bitrates
    2. Re:Original Rationale by Anonymous Coward · · Score: 0

      How does it compare to CELT? Or does CELT have similar problems to Speex in the over the air use case Codec2 is designed for?

    3. Re:Original Rationale by Bananatree3 · · Score: 4, Informative

      that is basically it. Speex is built (as I understand it) for lossless transmission methods with little/no error correction needed. Radio, by its very nature is a very lossy medium, so something with better error tolerance is needed. Hence, Codec2 provides a nice route.

    4. Re:Original Rationale by slimjim8094 · · Score: 2, Interesting

      Looks really cool. I haven't messed around with D-STAR since I don't like the idea of being tied into a specific system (seems to contravene the point of amateur radio). I'll definitely be keeping an eye on this to see where it heads.

      I had a really awesome idea just now for transmitting this at 1200bps using AFSK Bell 202 (like APRS) and hacking up live voice using entirely existing equipment (TNCs, etc). But the given example of 1050 bytes/3.75s works out by my math to 2240bps. I guess you could run it over 9600bps packet, with room to spare (text chat?)

      73,
      KC2YWE

      --
      I have developed a truly marvelous proof of this comment, which this signature is too narrow to contain.
    5. Re:Original Rationale by Bruce+Perens · · Score: 4, Informative

      Sound-card modem implementations over SSB would be practical. See FDMDV. We're still a little wide for that, but we'll get there.

    6. Re:Original Rationale by adolf · · Score: 2, Informative

      (Stating the obvious for those with sufficiently low UIDs and/or those who remember VAXen, or similar, or at least those with a proper beard...)

      that is basically it. Speex is built (as I understand it) for lossless transmission methods with little/no error correction needed. Radio, by its very nature is a very lossy medium, so something with better error tolerance is needed. Hence, Codec2 provides a nice route.

      that is basically it. Speex is built (as I understand it) for lossless transmission methods with little/no error correction needed. UDP , by its very nature is a very lossy medium, so something with better error tolerance is needed. Hence, Codec2 provides a nice route.

      (There. Extrapolated that for you. Doubly-so, perhaps.)

    7. Re:Original Rationale by Bruce+Perens · · Score: 4, Informative

      How does it compare to CELT?

      So far, we've really only compared it to g.729, and it does OK against that. CELT starts at 32 kilobits per second and we're at 2 kilobits, so it's not really for the same application. But I noticed that the Alpha, all-floating-point implementation with some known low-performance code encoded the 3.75 seconds in 0.06 seconds, and decoded them in 0.04, on my 2.4 GHz processor. I would think that a polished implementation could achieve low delay on a DSP chip or some flavors of embedded CPU.

    8. Re:Original Rationale by Gordonjcp · · Score: 3, Informative

      You could just about squeeze it into 2400bps. It would probably be possible to get that out of existing AFSK modems without needing to go down the route of discriminator taps and such. Using a hardware GMSK modem like the FX589 chip would give you 9600 baud with the option of interoperating with existing D-Star modems, and interfacing an FX589 is going to be easier to implement than a G3RUH modem.

    9. Re:Original Rationale by Anonymous Coward · · Score: 0

      There is a difference between a high bit error rate and occationally lost packets. Do Codec2 handle both cases well or did your extrapolation introduce incorrect information?

    10. Re:Original Rationale by Anonymous Coward · · Score: 0

      There's a fairly significant difference between 10% bit errors and 10% packet loss. Unless Codec2 is designed to split each frame of audio across multiple packets it's not a useful extrapolation. (I've written a basic VoIP app using Speex over UDP and, while not Skype, it was usable with 10% packet loss).

    11. Re:Original Rationale by the+way · · Score: 2, Interesting

      By the way, look over his web site rowetel.com for the other work he's done: two really nice Open Hardware projects - a PBX and a mesh telephony device, an Open Source echo canceler for digital telephony, used in Asterisk and elsewhere, and his own electric car conversion.

      I've got one of his little ip01 telephony boxes, and it is quite fantastic - a tiny, cheap, fanless, (embedded) Linux computer with plenty of memory and CPU grunt, and of course telephony hardware on board. It also has a package manager, with a quite a few pieces of software available, and regular firmware updates. It's much more powerful than the various Linux-based consumer routers that are available - it's a great option if you're looking for a small Linux server to run Asterisk, a little web site, DNS server, SSH, etc...

      (I'm not affiliate with David or Rowetel in any way - just a happy customer, who is in awe of the amazing things this guy has achieved in such a wide variety of areas).

    12. Re:Original Rationale by Yaur · · Score: 4, Informative

      With UDP the typical loss scenario is dropped packets but with radio single bit errors are more likely. This difference means that FEC strategies for one scenario are not directly applicable to the other.

      for UDP in packet FEC data is useless and your error correction scheme needs to be prepared to deal with losing a whole packets worth of data to be useful. For voice this is going to introduce too much latency so instead a typical codec might just try to interpolate the lost data. With radio on the other hand there is value to in packet error correction bits within the stream and in the event of an error you are going to have more data with which to guess what the audio should be like, especially if you know which bits are errored (or possibly errored)

    13. Re:Original Rationale by Bruce+Perens · · Score: 1

      I have an ip04 on my business phone number, which is a SIP DID.

    14. Re:Original Rationale by Anonymous Coward · · Score: 0, Flamebait

      How about you start licensing it with non nazi-like licences. I would use this if it wasnt LGPL licensed (I use a non standard language so I would need to modify it to add bindings to my languge). 2 clause BSD license would be good.

      But then, I suppose some people LIKE nazi-like licences. oh well.

    15. Re:Original Rationale by jmv · · Score: 5, Informative

      The fundamental difference is not that much the lossless vs lossy transmission, but the actual bit-rate. I designed Speex with a "sweet spot" around 16 kb/s, whereas David designed codec for a sweet spot around 2.4 kb/s. Speex does have a 2.4 kb/s mode, but the quality isn't even close to what David was able to achieve with codec2.

    16. Re:Original Rationale by TheRaven64 · · Score: 1

      Meh. I'm at least as critical of the GPL as the next guy, but it's hard to hate the LGPL2 (presumably he means v2.1). LGPLv3 has some significant issues, the most amusing one being that it is incompatible with GPLv2. The LGPL is non-viral, so you can keep the rest of your code under a more permissive license if you want to, but it has enough toothless legalese to keep the GPL crowd mostly happy.

      --
      I am TheRaven on Soylent News
    17. Re:Original Rationale by wowbagger · · Score: 3, Informative

      If you've ever heard AMBE in the presence of bit errors, it doesn't do so well either. It isn't the vocoder's job to deal with bit errors, it is the protocol's job. Over half the bits in a APCO-25 voice frame are forward error correction for the voice payload: Golay encoding, Reed-Solomon, bit order scrambling (interleaving), you name it.

      Putting resistance to bit errors in the codec is the wrong place to do it.

      Now, making the codec use less bits, so the protocol layer has more bits for FEC makes sense.

    18. Re:Original Rationale by fuzzyfuzzyfungus · · Score: 2, Insightful

      Wow. Ordinarily I'm of the opinion that crying "Godwin's Law!" is a bit overused; but having someone describe the LGPL as "Nazi-like" is making me reconsider.

      Somebody goes to the trouble of designing a novel, patent unencumbered(ie. if you don't like the software licence, you are perfectly free to write your own implementation), codec that fits an otherwise rather underserved niche. They have the temerity to release it under a license requiring you to release your modifications to their code if you distribute in binary form and this is somehow analogous to a particularly virulent flavor of genocidal fascism?

      You are really messing up the BSD crowd's reputation for being ideologically mellow compared to team GPL...

    19. Re:Original Rationale by fuzzyfuzzyfungus · · Score: 1

      In addition to the LGPL 2 being substantially less GPL-like than its name suggests(as well as being a fairly logical choice if you want to extend maximal freedom to the user, while making the creation of incompatible forks that have to be reverse-engineered less likely), arguing over the license of the reference implementation of a specifically-designed-to-be-patent-unencumbered codec spec seems especially pointless.

      Being able to use the reference implementation certainly is convenient and timesaving; but it is the codec itself that is the really important bit. Anybody is free to write a conformant implementation under their license of choice, or attempt to buy the right to use the code under some other license from its creator. Without patents, nobody can stop you from doing that, and there is nothing in the LGPL2 that prevents using the LGPL2 code as a reference when writing a new implementation, so long as you aren't just copying it.

    20. Re:Original Rationale by tepples · · Score: 1

      With radio on the other hand there is value to in packet error correction bits within the stream and in the event of an error you are going to have more data with which to guess what the audio should be like, especially if you know which bits are errored (or possibly errored)

      But wouldn't the underlying link just automatically FEC the packets at a lower layer, even if only to get the packet drop rate down?

    21. Re:Original Rationale by koiransuklaa · · Score: 1

      How about you start licensing it with non nazi-like licences. ... But then, I suppose some people LIKE nazi-like licences.

      Software license can often be negotiated: Authors may be willing to relicense (or add another one), if given a well presented and compelling argument for the change. This has happened before -- picking the right license can be difficult and it's possible the original authors did not think of all scenarios.

      Speaking of "a well presented and compelling argument": you, sir, did not make one.

    22. Re:Original Rationale by spickus · · Score: 1

      This is terrific. I hope to see 'FreeStar' repeaters soon.
       

      --
      Indecision is the key to flexibility.
    23. Re:Original Rationale by Anonymous Coward · · Score: 1, Funny

      When jmv speex, we all listen.

    24. Re:Original Rationale by Anonymous Coward · · Score: 0

      Ironically, had you posted that with the Codec2 codec, your entire post would've boiled down to this:

      "Instead of Radio, you should have said UDP."

    25. Re:Original Rationale by bill_mcgonigle · · Score: 2, Informative

      This works on 51-byte frames.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    26. Re:Original Rationale by BrokenHalo · · Score: 1

      I've written a basic VoIP app using Speex over UDP and, while not Skype, it was usable with 10% packet loss

      That's not bad. Skype is good in lots of ways, but I'm all for a FOSS alternative. Although I have successfully managed to use Skype over a 56k dialup connection (obviously sans video), my worst experience to date was a time when my only connection was a (basic-rate) satellite link, where upstream latency made any kind of VOIP conversation entirely one-sided.

      In a few months I'm going go be moving from Perth (Western Australia) to an isolated spot in Tasmania where I'm not going to be able to get any kind of "hard" copper or fibre connection, so I'll have to look at this issue again. I know there's always wireless (at a price, with tight traffic allowances), but if anyone can suggest a satellite link with good upstream speeds, I would be interested to read any input.

    27. Re:Original Rationale by BrokenHalo · · Score: 1

      Well said. I get very tired of all this whining about the flavour of this or that GPL licence, but they are all pretty much fair by comparison with the proprietary alternative.

    28. Re:Original Rationale by sjames · · Score: 1

      Why don't you just use defines in a header as a language shim so you can link against the unmodified library? Or work with the rights holders to contribute that shim or symbol aliases to the project?

      Or you could just cough up the bux for a proprietary library if the license on this one offends you so.

    29. Re:Original Rationale by drowe67 · · Score: 1

      Actually license issues put me to sleep, so I just took the first random choice that came to mind which was GPL, then followed swiftly by LGPL when some one complained. But then again, I am the sort of guy who gets excited for frequency domain speech coding.

    30. Re:Original Rationale by LandGator · · Score: 1

      Kindly mod up

      --
      There is nothing wrong with yr Internet. Do not attempt to adjust the picture. We are controlling the transmission - NSA
  3. Interactive communication? by ard · · Score: 1

    What is the compression ratio for more interactive communication, e.g. 20 ms sampling time instead of 3-4 seconds?

    1. Re:Interactive communication? by Bruce+Perens · · Score: 3, Informative

      It is a real-time codec on my workstation and is intended to be a real-time codec on embedded DSP. It's currently all floating point and does things it should not like malloc of multiple buffers per sample

      Download the code and build it. It's "just type make" on Linux. The raw (uncompressed) sample format we've used for testing is 16-bit samples at 8 KHz and there are some tools to play those, and some pre-recorded samples. Not too much trouble to figure out.

  4. Err Speex by Knee+Socks · · Score: 2, Informative

    Speex: Speex is based on CELP and is designed to compress voice at bitrates ranging from 2 to 44 kbps. Some of Speex's features include: Narrowband (8 kHz), wideband (16 kHz), and ultra-wideband (32 kHz) compression in the same bitstream

    --
    BLACK KNIGHT SECURITY SYSTEMS
    We'll bite your legs off
    1. Re:Err Speex by Gordonjcp · · Score: 3, Informative

      Speex isn't great in this application, because at low bitrates there is a significant delay through the codec and the output stream requires far too much bandwidth to be useful. Consider that digital speech systems like Mototrbo, TETRA, P25 and Iridium typically have less than 6kbps throughput once you've taken FEC into account.

    2. Re:Err Speex by jmv · · Score: 1

      Sure, Speex does 2 kbps, but if you compare that to codec2, there's a hell of a difference. The 2 kbps Speex mode is something I put together quickly -- mainly to encode comfort noise at low rate. On the other hand, David put a lot of effort into codec2 and it actually sounds decent for voice at that rate (IMO better than Speex sounds at 4 kb/s).

  5. what about LATENCY? by Kristopeit,MichaelDa · · Score: 4, Interesting
    why is seemingly the most important aspect of communication technology so often overlooked?

    i assume it's acceptable... but it angers me that someone thought it was relevant to give the exact number of bytes for a seemingly arbitrary 3.5 seconds of audio, but failed to say how long it take to encode that 3.5 seconds of audio, or what average latency can be expected after buffer conditions are met.

    1. Re:what about LATENCY? by Bruce+Perens · · Score: 4, Informative

      Right. Sorry. Real time on the x86 workstation I'm using. Not converted to fixed-point for weaker CPUs yet. Not tested on ARM, Blackfin, AVR, etc. Waiting for you to do that :-) Downloadable code. Reasonably portable. Type make and let fly.

    2. Re:what about LATENCY? by Garridan · · Score: 2, Insightful

      Well, the source is right there on the webpage. Why don't you download & compile it, and see for yourself? It's an alpha release so I'll guess that it's slower than it could be.

    3. Re:what about LATENCY? by Kristopeit,MichaelDa · · Score: 2, Interesting
      it could take 16MB/s and still function in real time over the internet for me... my problem isn't that the latency wasn't shown, it was that the bitrate WAS shown BUT the latency wasn't shown.

      also, considering the advantages of using lower bitrate voice codecs, the ability to implement the encoder and decoder algorithms directly in very low transistor count custom hardware would appeal to the same crowd... so not just latency in terms of x86 instructions per second, but the ability to implement those instructions in hardware.

      i am concerned about bruce's use of the term "real time"... either he is implying there is no noticeable latency to him, (which is irrelevant to me as numerous others claim skype video chat is "real time", and also impossible given the implicit time consuming process of encoding), or he's cleverly stating that the time it takes to encode is the real time it takes to encode. it's not the fake time. it's real time.

      again, i assume, and it seems i'm correct to do so, that the codec is "very usable"... i won't be trying it as i have no need for it.

    4. Re:what about LATENCY? by Bananatree3 · · Score: 1

      The final destination for Codec2 *isn't* X86 processors, but DSP chips. If, for some reason latency is an issue when it's first shoehorned into a DSP chip, Codec2 will be refined until it works well on a DSP chip, in real real time.

    5. Re:what about LATENCY? by Kristopeit,MichaelDa · · Score: 2, Interesting
      yes, of course... but "refining" a codec for hardware implementation is doing the exact opposite to the quality of the signal.

      why not refine the a DSP chip architecture until it works well with the original codec? i know masks are expensive... but why not do it all the way?

    6. Re:what about LATENCY? by KliX · · Score: 2, Informative

      I think he probably means it in a 'how many samples does the codec need before it can send a packet' type of latency.

    7. Re:what about LATENCY? by Bruce+Perens · · Score: 4, Informative

      There are currently 51 bits in a frame. That is the minimum that you can send, and you'd send 40 of those per second as the codec is presently implemented. A real data radio would add bandwidth for its data encapsulation, but would have to meet the time and bandwidth requirements of the codec payload.

    8. Re:what about LATENCY? by Anonymous Coward · · Score: 0

      I don't see the .msi? Are there VB sources for this?

    9. Re:what about LATENCY? by Anonymous Coward · · Score: 0

      Any reason not to try to get it included in the FFmpeg project?

    10. Re:what about LATENCY? by sahonen · · Score: 1

      What we're trying to ask is if you pipe a real time stream of samples from a microphone into one end, encapsulate the data in UDP packets, bounce the stream off 127.0.0.1, unencapsulate them, pipe it into a decoder and from there into a sound card and speaker... How much time is there between me saying "hi" into the mic and hearing "hi" out of the speaker? This is by far the most important consideration for modern voice protocols. Low bandwidth is nice. Low CPU is nice. Error tolerance is nice. Latency is crucial. If you don't think it's crucial, get 30 people in a Ventrilo channel and listen to them step all over each other.

      The developers of Mumble have gotten very good at reducing latency, and would be worthwhile to bounce ideas with.

      --
      Make me a friend and I'll mod you up
    11. Re:what about LATENCY? by Dan+Dankleton · · Score: 1

      Latency is crucial for the application you are talking about. Low bandwidth and error tolerance is more important for a codec which will primarily be used for simplex radio applications which Codec2 is designed for (ignoring for the moment the D-Star reflectors.) Different applications, different requirements. This is why there are lots of codecs ;) Dan MD1CLV

    12. Re:what about LATENCY? by sahonen · · Score: 1

      If your transmission medium is half duplex and shared by several users, you run into exactly the phenomenon I described. A codec with 250ms of latency creates a 250ms window in which two people can start talking without realizing they're stepping on each other. People on aviation frequencies step on each other all the time and the only latency there is speed of light radio propagation.

      --
      Make me a friend and I'll mod you up
    13. Re:what about LATENCY? by Bruce+Perens · · Score: 1

      There is no reason that you can't send a packet for each frame. There isn't any important state, so far, that persists between frames. That's 7 bytes (really 51 bits) 40 times per second. CPU speed doesn't seem to be a problem for latency from what we have seen so far.

    14. Re:what about LATENCY? by Dan+Dankleton · · Score: 1

      It happens in analogue ham radio too. In FM, thanks to the capture effect, it means that one of the two gets heard and there is a 'protocol' between speakers to deal with that. I don't know what the effect is of multiple signals on the modulation used in D-Star though.

    15. Re:what about LATENCY? by jmv · · Score: 1

      Don't worry. The frame size is 20 ms and there's probably (haven't looked at that detail) around 10 ms of look-ahead, so latency shouldn't be an issue. I'd actually argue that it could be increased *if* there's a way to reduce the bit-rate by doing that.

    16. Re:what about LATENCY? by sahonen · · Score: 1

      So basically, 25ms of encoding latency, plus the latency of your audio hardware input and output buffers, plus network/medium propagation (5-10ms for satellites?), plus any network jitter buffering. That's pretty good. CELT claims 3-9ms but I'd like to hear a comparison of audio quality at 24 kbps, especially considering the differences between their designs.

      --
      Make me a friend and I'll mod you up
    17. Re:what about LATENCY? by Kristopeit,MichaelDa · · Score: 0, Flamebait

      i was never worried. you're an idiot.

    18. Re:what about LATENCY? by Kristopeit,MichaelDa · · Score: 1
      i agree latency SHOULD NOT be an issue. my issue was determining IF latency IS an issue.

      bruce has stated a .1 second total codec processing time on the 3.75 sec audio sample. i don't know what that means for response times, or how they change with longer or shorter or streaming audio samples. what happens if a stream is interrupted? how many frames are lost? is there a noticeable audible byproduct of lost or damaged data?

    19. Re:what about LATENCY? by Kristopeit,MichaelDa · · Score: 1
      i don't mean to be rude, but how about you just make an audio recording of you live streaming from one machine to another? a video maybe? do you have a digital camera that can take videos?

      1 picture... 1000 words, and such. i could have made a video of this comment and uploaded it to youtube faster than i could type and post it.

      i understand latency might not be an issue for the intended application, but developers choosing which codec is best for their own applications will certainly require initial response delay and continued latency numbers to make informed decisions.

    20. Re:what about LATENCY? by Kristopeit,MichaelDa · · Score: 1
      he later said it took .06 sec to encode and .04 sec to decode the 3.75 sec sample.

      do those numbers mesh?

    21. Re:what about LATENCY? by vlm · · Score: 2, Informative

      If, for some reason latency is an issue when it's first shoehorned into a DSP chip, Codec2 will be refined until it works well on a DSP chip, in real real time.

      I think you are not using the definition of latency that most in the field would use.

      Latency is how long it takes to process the data. Its a computer science type of thing. If you understand Knuth and his tape drive sorting examples, this is pretty obvious...

      For example, heres a nice, simple, hopelessly useless codec that has almost exactly 100 ms of latency:

      1) Get yerself a buffer that holds 1000 samples.
      2) Run a A/D converter at 10Ksamples/sec until the buffer is full.
      3) Run "gzip" on the 1000 sample buffer, squishing it down to maybe 500 bytes. Optimistically.
      4) send the 500 byte chunk to the other side (radio, internet, whatever)
      5) Run "gunzip" on the hopefully unerrored compressed 500 bytes, expanding it back to 1000 raw samples.
      6) Squirt yonder 1000 raw sample values out the D/A converter at 10Ksamples/sec
      7) Pray ye get another packet of compressed voice data before the well runs dry. Or maybe listen to a bit of silence. Or play interpolation games.

      Your argument is once steps 3 and 5 are quick enough, the codec latency will be zero. A fine bit of analysis, however, shows that the first sample to enter the buffer in step 2 cannot possibly be decompressed and played back, until the buffer fills, which takes... 100 ms aka 1/10 of a second. This is the latency we're talking about in voice codecs. Most are somewhat faster than 1/10 of a second.

      It is quite possible to make a very efficient codec where a fireman would hit the PTT button, yack into the radio for a fifteen minutes, and then the entire message would be compressed down to maybe 1000 bits/sec theoretical average, then sent and played back. This would, of course, be completely useless for tactical public safety comms.

      --
      "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
    22. Re:what about LATENCY? by Anonymous Coward · · Score: 0

      Dude, learn to chill the f*** out.

    23. Re:what about LATENCY? by Mike+Da.+Kristopeit · · Score: 1
      ur mum's face can chill the fuck out.

      you are NOTHING

    24. Re:what about LATENCY? by sjames · · Score: 1

      Keep in mind, this is alpha code that hasn't yet been converted to fixed point. The final performance is just a guess at this point. The intrinsic latency will be 25 milliseconds due to the frame size.

      To put that 25 milliseconds into perspective, I've found that most people won't even perceive it if I drop 25 milliseconds out of an audio stream.

      People who do have a use for it would probably be much better at judging what level of performance is acceptable. People with no use for it have no feel for the trade-offs involved.

    25. Re:what about LATENCY? by Mike+Da.+Kristopeit · · Score: 1

      the tradeoffs are trivial... quality of signal vs everything else

    26. Re:what about LATENCY? by sjames · · Score: 1

      Yes, that is the main tradeoff, but the threshold moves a LOT depending on the conditions and requirements.

      There are, however more tradeoffs than you have thought of which depend on how you define "quality". For example, in emergency communication fidelity is practically unimportant but intelligibility is essential. For a lot of music where nobody can understand the singer anyway, intelligibility doesn't matter as much as fidelity. For basic communication, everything that is not the voice is "noise" and we'd rather it not get encoded at all. For music, those other sounds are called "the band" and we'd like to hear it.

    27. Re:what about LATENCY? by drowe67 · · Score: 1

      David Rowe, the author here. The latency is about 40ms. The encoder accepts buffers of 20ms (160 samples) and the decoder outputs buffers of 20ms (160 samples) So assuming zero transmission delay, you get your first output speech sample about 40ms. It's comparable to cell phone codecs like GSM, and fine for real time communications.

    28. Re:what about LATENCY? by drowe67 · · Score: 1

      Latency won't change when it moves from an x86 to a DSP chip, it's define by the algorithm. As in my comment above it will remain at about 40ms, similar to other speech codecs like Speex, GSM, G.723 etc

    29. Re:what about LATENCY? by fnord_uk · · Score: 1

      Er, a fireman who holds a PTT down for 15 minutes would make the tactical public safety system useless by himself.

      --
      In theory, theory and practice are the same. In practice, they're not.
    30. Re:what about LATENCY? by Mike+Da.+Kristopeit · · Score: 1
      is that including the transmission delay of the instruction pipeline? i agree 40ms is extremely usable for full duplex. right now i'm content with 250-300ms for one way live broadcasting with interactive chat.

      what i'd really like is the highest quality video recording you can muster of you in an environment with 2 computers, one with a mic and one with a speaker... film yourself transcoding. i fully understand the DSP chip implementation should be better in terms of latency, but the quality of signal will probably not be any better, and will likely be worse as compromises are made to fit the chip architecture... i understand the limits of advertising TVs in a newspaper... but even youtube pretty accurately portrays a recorded environment.

    31. Re:what about LATENCY? by vlm · · Score: 1

      Er, a fireman who holds a PTT down for 15 minutes would make the tactical public safety system useless by himself.

      Well, actually, no. I hung out with ham radio guys whom reprogrammed old public safety radios for ham radio purposes. Was quite a popular activity a decade or two ago when trunked systems came into the area and made the "plain FM" radios obsolete and thus dirt cheap on the surplus market. The main problem is getting a programing cable and the programing software. Secondary problem is the expendable parts (rechargable batteries, etc) are priced at "public safety prices" meaning if you have to ask what they cost, you certainly can't afford them.

      Anyway most of the manufacturers have a feature to prevent the PTT problem. Motorola called it "Time Out Timer" (probably a (TM) of Motorola). Transmit more than X seconds and it stops transmitting and blasts out an alert signal on the speaker, terrifyingly loudly on some models...

      Reason for this, is worst case scenario a fireman gets trapped in a structure with his radio jammed on and no one can communicate to rescue him or even figure out who's stuck.

      I would imagine this "feature" annoys the hell out of long winded public safety personnel, but its overall a good idea.

      --
      "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
    32. Re:what about LATENCY? by Anonymous Coward · · Score: 0

      D o n ' t f e e d t h e t r o l l

    33. Re:what about LATENCY? by fnord_uk · · Score: 1

      Hmm. Well I don't think we disagree here, but I admit my choice of words was poor. If a fireman could hold the key down and block the channel for a lengthy time, then the system would have a design flaw. My point was that the codec has nothing to do with it.

      BTW, I'm a radio ham :)

      --
      In theory, theory and practice are the same. In practice, they're not.
    34. Re:what about LATENCY? by QuantumBeep · · Score: 1

      In the immortal words of the Great Knights of Yore:

      YHBT. YHL. HAND.

  6. Serindipidy. by firstnevyn · · Score: 3, Interesting

    As a newly licenced ham in a area where Dstar repeaters are everywhere (VK) and free software advocate I have recently become aware of the issues with Dstar and have been reading about this work so it's quite surreal to have it pop up on /. in the week where I get my licence. I havn't had a chance to read the Dstar specifications but am wondering if the voice codec is flagged in the dstar digital stream. and if it would be possible to create translating repeaters so dual output repeaters with differently coded data streams it'd take more spectrum but would also allow for a migration path (at least for repeater users?)

    1. Re:Serindipidy. by Bruce+Perens · · Score: 4, Informative

      Congratulations on the license, OM. We haven't yet explored how to wedge this into D-STAR, but sending it as data rather than voice would be one way. All of the D-STAR radios except the latest one, the IC-92AD, use a plug-in daughter board to hold the AMBE chip, and it might be that somebody could make a dual-chip version of this board sometime. Since AMBE is proprietary we are stuck using their chip if we want to be compatible, unless the repeater does the conversion for us using a DV-Dongle. They sell TI DSP chips with their program burned in, and don't give out the algorithm.

      It may be that on D-STAR the AMBE chip also does the modulation for a data transmission, just doesn't run the codec. But the modulation is known and there is a sound-card software implementation of D-STAR that interoperates with it. I don't have any D-STAR equipment to test. The folks on dstar_development@yahoogroups.com know a lot more about D-STAR.

      73
      K6BP

    2. Re:Serindipidy. by MichaelSmith · · Score: 1

      Why does a repeater need to understand the encoding? Can't it just rebroadcast the data, or even the analogue signal?

    3. Re:Serindipidy. by __aajfby9338 · · Score: 4, Insightful

      Congratulations on your new license!

      The proprietary AMBE codec bothers me, too. I think that a closed, license-encumbered, proprietary codec is entirely inappropriate for ham radio use.

    4. Re:Serindipidy. by Bruce+Perens · · Score: 4, Informative

      The repeater can rebroadcast the data, but that data would be AMBE encoded, and AMBE is both trade-secret in its implementation and patented in some of its algorithms. There may be an AMBE chip in the repeater, I've not played with one. The usual way one converts to and from AMBE on a PC is with a device called the DV-Dongle, which contains the AMBE chip. This costs lots of money and is not nearly so powerful as the CPU of the computer it's plugged into, which is one reason to be fed up with proprietary codecs.

      So, if you had some newer, Codec2-based radios, and some older D-STAR radios, linking repeaters might be a good way to get them to talk to each other.

      This is hand-waving about a lot of issues, like we've not designed the next generation of data radio to put Codec2 into. One might guess that such a thing could use IPV6, and better modulation than just FM, and FEC, etc.

    5. Re:Serindipidy. by Anonymous Coward · · Score: 0

      This must be Google doing. Because on my slashdot, there's no news about voice codecs.
      On the other hand, i was just recently looking for visual-information about "upskirts" and when i finally ended up in slashdot. There it was: new about future transparent airbuses.

    6. Re:Serindipidy. by Dan+Dankleton · · Score: 1

      I had some thoughts about this when I first looked into D-Star.

      There are some reserved bits in the data section of the protocol (even when voice is being transmitted) which are defined as 0 in current implementations. It would be relatively easy for repeaters to be upgraded to understand that a 1 in one of those indicates a different codec, and use some more reserved bits to indicate which codec.

      Repeaters (separate units) could be used to transcode if the end users are using different codecs - this would involve a software change on the repeaters but would not require that people who've already got D-Star radios upgrade.

      The basic D-Star design seems quite good to me, but I can't understand why some kind of futureproofing wasn't designed in from the start since the quality of codecs is improving all the time!

      Dan MD1CLV

    7. Re:Serindipidy. by Agripa · · Score: 1

      This is hand-waving about a lot of issues, like we've not designed the next generation of data radio to put Codec2 into. One might guess that such a thing could use IPV6, and better modulation than just FM, and FEC, etc.

      I suspect FM voice compatibility and economics may be too important to justify switching to any modulation which is not constant envelope. I notice that all of the D-Star modes use GMSK (Gaussian Minimum Shift Keying) which is completely compatible with inexpensive and simple FM circuit design. A general purpose I/Q fed signal chain for supporting other than constant envelope modulation modes would require linear instead of class C amplifiers and AGC (Automatic Gain Control) instead of limiting.

      I complain about D-Star's use of a proprietary codec whenever someone will listen but the choice of GMSK for simplicity, economics, and low adjacent channel interference is a good one.

  7. Speex developers are involved by Bruce+Perens · · Score: 3, Informative

    Jean-Marc Valin is on the project mailing list and David is another Speex developer and the person Jean-Marc recommended to me. We are trying for an improvement over Speex at low rates.

  8. Great news by Anonymous Coward · · Score: 2, Informative

    >3.75 seconds of clear speech in 1050 bytes

    That's 2240 bps, 2.19 kbps, quite impressive. Maybe one day they can beat MELP (up to 600bps) and remain open.

    Excellent work.

    1. Re:Great news by Bruce+Perens · · Score: 4, Interesting

      I think you could cut the sample rate in half and get acceptable performance, but I've not tried. Currently I think it's 25 microsecond frames, and each frame has one set of LSPs and two sets of voicing information so it's interpolated into 12.5 microsecond frames. Those lower bandwidth codecs do 50 microsecond frames. Go forth and hack upon it if you'd like to see. Also, there are some optimizations that are obvious to David and Jean-Marc (and which I barely understand) that haven't been added yet. One is that the LSPs are monotonic and nothing has been done to remove that redundancy. Delta coding or vector quantization might be ways to do that. I understand delta coding but would not be the one to do VQ. Another is that there is a lot of correlation of the LSPs between adjacent frames, so you don't necessarily have to send the entire LSP set every frame. And there is probably lots of other opportunity for compression that I have no concept of.

    2. Re:Great news by Yaur · · Score: 1

      You mean milliseconds... 25 microseconds is less than one sample at 44khz. Somewhere around 100ms is the lower edge of where its "noticeable" in the flow of the conversion.

    3. Re:Great news by jmv · · Score: 1

      Hi Bruce,

      Just a minor correction, the frame size is 20 millisecond, not 20 microsecond :-). As for VQ, the concept is not that hard really. Of course, as for many things, the devil's in the details, many of which I got wrong in the Speex LSP VQ anyway.

    4. Re:Great news by Bruce+Perens · · Score: 1

      Oops. Thanks!

  9. Thankyou! by thephydes · · Score: 1

    I use digital almost exclusively and have wondered about when a suitable open source voice project would emerge. I look foreward to seeing it developed further. Tim VK4YEH

    1. Re:Thankyou! by Bananatree3 · · Score: 3, Informative

      you'll be happy to know that it's a fellow Australian ham developing this Codec2 - David Rowe, VK5DGR Here's a link to David's development page

    2. Re:Thankyou! by thephydes · · Score: 1

      Yes, I heard about this a couple of days ago - on VKlogger forums maybe? Have made a donation to the cause by the way. Tim

    3. Re:Thankyou! by Bananatree3 · · Score: 1

      Great work on the donation- try to get some local hams to donate - David responds well to all kinds of donation sizes, small and big :)

  10. Awesome by sv_libertarian · · Score: 1

    I hope this takes off. It would be great to have a good OSS voice codec for amateur radio.

  11. Packet loss? by Amarantine · · Score: 3, Interesting

    I didn't see it mentioned when quickly scanning TFA, but how does this codec handle packet loss?

    It is all nice and well to develop a codec to cram as much speech as possible in as few bits as possible, but in this case, one lost packet could mean a gap of several seconds. The success of a low-bandwidth codec, at least when it comes to IP telephony, also depends on how well it can handle lost packets. Low bandwidth codecs are usually used in low bandwidth networks, such as the internet, and there the packetloss is the highest.

    Same goes for delay and jitter, by the way. If a stream of packets is delayed, and more voice is crammed in fewer bits, then the delays in the voice stream will get longer too.

    1. Re:Packet loss? by Bruce+Perens · · Score: 4, Informative

      We don't know yet, but I don't see how it could be worse than AMBE in D-STAR, which makes various eructions when faced with large packet loss. I did various sorts of bit-error injection inadvertently while debugging yesterday, and right now you still get comprehensible voice with significant corruption of the LSP data. This, IMO, indicates an opportunity for more compression. Handling the problems of the radio link is more a problem for forward error correction, etc.

    2. Re:Packet loss? by rrossman2 · · Score: 1

      It would be great to be able to get this on phones. I know most VoIP/SIP type applications work fairly well on 3g, but if you don't have 3g coverage (or are on a smaller cell company who only licenses EDGE from the other GSM carriers) then it kinda-sorta works with 3 second delays and the occasional garbled audio. For example, my Nokia N95 on Immix doesn't get 3g (Immix didn't opt for 3g coverage from T-Mobile or AT&T even if the phone supports it and you're on their networks) but does edge at around 350kbps or so. Fring voice calls work, with the flaws mentioned earlier. If I'm on WiFi with my phone, obviously it's much better and the delay + garble disappear even if the other side is on a 3g link (in that case Verizon).

      With more and more people with smart phones, it would be sweet to be able to bundle the codec in such a way that the phones would be able to use it in applications such as fring, google talk, etc so you could talk to any one of your smart phone friends without needing to use any minutes, and if they are the ones you talk to the most you could drop your minute plan down to next to nothing.

      (It would be even sweeter for me since the only people I really call all have smart phones, I could just get a Verizon or AT&T data only plan for $35 or whatever and make all my calls via SIP)

    3. Re:Packet loss? by Bruce+Perens · · Score: 1

      You still need lots of small packets if you don't want high latency. So this is better but still uses lots of networking.

    4. Re:Packet loss? by drowe67 · · Score: 1

      if we include an "erasure mode" this type of codec is pretty good at handling packet loss, as it is easy to interpolate between two adjacent frames. CELP type codecs have a lot more memory so tend to be less robust. Also conversational speech has only about a 30% activity factor, so 7/10 packet losses will be in silence of background noise frames.

  12. Really early latency figures by Bruce+Perens · · Score: 4, Informative

    It encoded those 3.75 seconds in 0.06 seconds and decoded in 0.04 seconds on my AMD Phenom 9750 2.4 GHz, one core only, compiled with GCC and the -O3 switch. That's all of the overhead of the program starting and exiting, too. It's using floating, not fixed point.

    This, it seems, bodes well for low latency of the final implementation on a DSP chip.

    1. Re:Really early latency figures by Anonymous Coward · · Score: 0

      That isn't latency! If it requires 3.75 seconds of audio before a compressed stream is emmitted it is largely useless for comms. Fortunately, from a super-post of this one it appears that latency is pretty low - 1 frame is the minimum that can be sent (40 frames per second, 51 bits per frame).

    2. Re:Really early latency figures by Kristopeit,MichaelDa · · Score: 1
      yeah, this is what i'm trying to figure out... sally says "hi"... how long until bob hears her.

      .1 seconds does bode well for an eventual lower level implementation. 3.85 seconds and you might as well trash it, but i'm almost certain that isn't the case as the phrase "real-time" was thrown out a few times.

    3. Re:Really early latency figures by Bruce+Perens · · Score: 3, Informative

      No significant state between frames so far. 25 miliseconds per frame. That is the minimum delay before the other side starts to hear the audio.

    4. Re:Really early latency figures by Rogerborg · · Score: 1
      Yup, Core 2 Duo P8700 @ 2.53Ghz, compiled with O3:

      time ./c2enc ../raw/hts1a.raw hts1a_c2.bit

      real 0m0.062s
      user 0m0.060s
      sys 0m0.000s

      time ./c2dec hts1a_c2.bit hts1a_c2.raw

      real 0m0.048s
      user 0m0.044s
      sys 0m0.004s

      Thanks for promoting this, it's a fascinating project.

      --
      If you were blocking sigs, you wouldn't have to read this.
    5. Re:Really early latency figures by TheRaven64 · · Score: 1

      Bruce, you've replied to this question several time, but you are not understanding the question. Almost every encoder buffers some data then compresses it. Generally, the larger the buffer, the better the compression, but the greater the delay between starting to put audio into the encoder and starting to get audio out. The same thing happens at the decoder end. The question is how much (in terms of milliseconds of audio) does the encoder need to buffer before it starts compressing and how much does the decoder need to buffer before decompressing? Add these two numbers together, and you get the latency number that the original poster wants.

      The numbers in TFS are 280 bits per second. If you have 51-byte frames, this is 5.5 frames per second. Assuming that frames are independent, this gives 180ms of encoding latency, presumably the same amount of decoding latency, which seems incredibly high. Presumably this goes down if you increase the bit rate a bit, so at 2KB/s it would be about 50ms (assuming that the codec is, as you imply, stateless between frames). That's a lot more reasonable.

      --
      I am TheRaven on Soylent News
    6. Re:Really early latency figures by Goaway · · Score: 1

      I don't know why you people keep badgering Bruce about this, when I could figure out the answers to all that within minutes of looking at the linked site. How about going and reading for yourself?

    7. Re:Really early latency figures by vlm · · Score: 1

      I don't know why you people keep badgering Bruce about this, when I could figure out the answers to all that within minutes of looking at the linked site. How about going and reading for yourself?

      Because almost all codecs have a certain inherent fixed latency. And its by far the most important figure of merit in the real world. And no one wants to discuss it, therefore it must be horrifically bad.

      Number one priority for codec designer is always will it fit in the available B/W goal. This is a simple T/F Y/N 1/0 either it fits or it doesn't.

      Number two priority is minimum inherent codec latency. Humans don't talk so well above 100 ms or so (debatable). That doesn't mean you get 100 ms to blow in the codec, because you've got radio / buffering / FEC / ping time type latencies.

      As a codec designer you get to fight the network/RF guys constantly over whom gets to budget how much of the available latency. Not to mention fighting marketing whom claim it'll use zero bits/sec while somehow having zero latency.

      --
      "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
    8. Re:Really early latency figures by Bruce+Perens · · Score: 1

      I think it is 25ms. 51 bit frame 40 times per second. Please read the code.

    9. Re:Really early latency figures by fbjon · · Score: 1

      Just about any codec is effectively real time. The problematic delays introduced in skype conversations and such is almost all down to the network, no codec will help with that by itself.

      --
      True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
    10. Re:Really early latency figures by fbjon · · Score: 1

      51 bit frames. Now visit the site, spoonfeeding demands are frowned upon.

      --
      True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
    11. Re:Really early latency figures by Mike+Da.+Kristopeit · · Score: 1
      you're wrong... codecs help with latency using throttling. you might argue, as many do, that the side effects and increased state data requirements aren't worth it... others would argue, as many do, that the decreased latency is vital.

      i don't care either way, as i stated before i have no need for this, and if a need ever did arrive, like bruce said, i can just type "make on linux".

      right now i broadcast 24/7 from my home full 720p with rich audio and there is about .3 seconds delay on the signal. all free codecs.

    12. Re:Really early latency figures by Goaway · · Score: 1

      And no one wants to discuss it, therefore it must be horrifically bad.

      It is discussed on the damn site which you could just click on, is what I am saying.

    13. Re:Really early latency figures by LandGator · · Score: 1

      Ah, but the target market are hams, not humans.... de K7AAY

      --
      There is nothing wrong with yr Internet. Do not attempt to adjust the picture. We are controlling the transmission - NSA
  13. English only ? by Yvanhoe · · Score: 4, Interesting

    At such high compression rates, one could wonder if the optimizations to transmit clear speech make assumptions about the language used. Does it work well with French ? Arabic ? Chinese ?

    --
    The Wise adapts himself to the world. The Fool adapts the world to himself. Therefore, all progress depends on the Fool.
    1. Re:English only ? by Bruce+Perens · · Score: 4, Interesting

      The basic assumptions are based on the mechanics of the vocal tract, and I suspect not high-level enough to differ across languages, but obviously it would be nice to hear from speakers of other languages who test it. We could also use a larger corpus of spoken samples for testing.

    2. Re:English only ? by Anonymous Coward · · Score: 1, Interesting

      One of the earlier observations in this field was that a low-bandwidth filter specifically hurt languages with hissing sounds, and I presume you'd have similar problems with click sounds.

      As the GP indicated, Chinese (Mandarin) would be an important addition as it's a tonal language, and compression should not blur that distinction. Languages such as Arabic have emphatic consonants, which aren't that common in Western languages either. But French? That's for all practical purposes identical to English. Finnish would make more sense. As for the click sounds, try Zulu. It apparently has 15 distinct click sounds, and there should be enough speakers online.

    3. Re:English only ? by Ecuador · · Score: 1

      The languages you mentioned don't really use much different sounds. If you want a real test try the clicking sounds in Zulu, Xhosa etc.

      --
      Violence is the last refuge of the incompetent. Polar Scope Align for iOS
    4. Re:English only ? by jmv · · Score: 1

      Actually, this is not low enough for language to really have an effect other than tonal vs non-tonal languages. As long as you "train" quantizers with multiple languages you're fine. I would not expect language-dependencies to actually kick in until you hit something like 100 bps or below (i.e. when you need to do speech-to-text in the "encoder" and text-to-speech in the decoder).

    5. Re:English only ? by Anonymous Coward · · Score: 0

      Basic sounds are the same everywhere, is not like different symbols for different languages, because the vocal tract is more or less the same for every human,scaled(universal), as Bruce points out.

      There were differences in the range of sounds used by every language, but today that is not the case, thanks to communications advances. E.g Japanese people had incorporated external language sounds and it is not alien anymore.

    6. Re:English only ? by sootman · · Score: 1

      Spanish is spoken so quickly, compressing it is like trying to make an MP3 smaller by zipping it--it just won't work. French, though, with all its mushy pronunciation, compresses very well, like how a blurry image responds well to JPG encoding.

      --
      Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
    7. Re:English only ? by Code+Master · · Score: 1

      Having worked with G.X series vocoders, I can tell you that the language can matter. Certain languages (Mandarin) are much more tonal than others, and sometimes high compression vocoders don't do as well with them as with English. Also, vocal tract based vocoders tend to perform poorly when compressing music or other non-verbal audio.

      --
      The Code Master
    8. Re:English only ? by Bruce+Perens · · Score: 1

      I hope we get a Mandarin tester. Comprehensible speech with buzzsaw noise etc. is important for emegency services radio and will be tested.

    9. Re:English only ? by LeadSongDog · · Score: 1

      Consider http://en.wikipedia.org/wiki/Wikipedia:SPOKEN for a liberally licensed corpus of spoken texts in many languages, with text equivalents linked for comparison.

      --
      Oh, I'm sorry sir, I thought you were referring to me, Mr. Wensleydale.
    10. Re:English only ? by drowe67 · · Score: 1

      The tones in Chinese are short term variations in pitch. It's not really that different to the way we use pitch to convey emotion and questions in English, although perhaps the variations are faster. Codec2 explicitly analyses and encodes pitch. So it should be fine. I'm learning Mandarin myself so will do some tests with "Wo de LaoShi" (my teacher) soon :-)

    11. Re:English only ? by illtud · · Score: 1

      This is the question I was going to ask. I'm sure that a lot of sounds in other languages are just noise to an english-centric codec. Bruce has replied to you that "The basic assumptions are based on the mechanics of the vocal tract" but I'm wondering if that means vocal cord frequency range + some sibilants.

      Sorry Bruce if I'm jumping to conclusions. I'll test Codec2 for Welsh compatibility!

    12. Re:English only ? by QuantumBeep · · Score: 1

      I've never heard a Spanish speaker operate from 10hz to 20khz at full dynamic range, but that doesn't mean I wouldn't love to hear what that sounds like!

  14. METAL GEAR?! by Anonymous Coward · · Score: 0

    "This is Snake, I am in front of the disposal site..."

  15. Mumble integration ? by Anonymous Coward · · Score: 4, Interesting

    One of the fastest ways to ensure its testing and distribution is to use it in Mumble - the low latency voice chat software (with an iPhone client as well).
    Mumble is typically used by gaming clans for their chat rooms and it Codec2 would be tested in real-life conditions.

    1. Re:Mumble integration ? by Bananatree3 · · Score: 1

      awesome idea as a potential beta testing group?

    2. Re:Mumble integration ? by Bruce+Perens · · Score: 4, Interesting

      One of the fastest ways to ensure its testing and distribution is to use it in Mumble - the low latency voice chat software (with an iPhone client as well). Mumble is typically used by gaming clans for their chat rooms and it Codec2 would be tested in real-life conditions.

      Is there an existing Mumble developer whom we could get interested in this? It might be that we should take some of the Alpha-isms out of the code first.

    3. Re:Mumble integration ? by Inda · · Score: 5, Insightful

      Is this really Slashdot? Do I have a DNS error?

      These are the stories I used to enjoy. I don't realy understand them, but they make a good read.

      --
      This post contains benzene, nitrosamines, formaldehyde and hydrogen cyanide.
    4. Re:Mumble integration ? by Anonymous Coward · · Score: 1, Insightful

      I know - dense on tech (to the point I don't understand 90% of it, but still enjoy it!), and low on FUD/hate/blah.

    5. Re:Mumble integration ? by rdnetto · · Score: 1

      That cause they used a lossy compression algorithm on it, so for the large part only relevant posts were stored.

      --
      Most human behaviour can be explained in terms of identity.
  16. Impressive. by mrjb · · Score: 1

    1050 bytes for 3.75 seconds of speech is the equivalent of 2240 bits per second- good enough that an old-school 2400 baud modem would be able to transfer speech in realtime. Impressive. But I seem to recall that the speech synthesizer of the TI-99 stored voice audio in as little as 1200 bits per second. It was well-documented enough that TI emulators emulate the speech synthesizer as well. But the sound quality left to be desired, which is probably one area where codec2 shines. I've listened to the example files and the sound quality seems fine- I can't tell the difference in audio quality between source and target files. Partially this may be because the source material already seems to be bandwidth-limited- probably using an 8 kHz low pass filter as is common for telephony applications.

    --
    Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book
    1. Re:Impressive. by Rogerborg · · Score: 1

      I can't tell the difference in audio quality between source and target files.

      Really? It sounds quite pronounced to me. It's still very impressive, but it's not magic.

      --
      If you were blocking sigs, you wouldn't have to read this.
    2. Re:Impressive. by lobiusmoop · · Score: 3, Informative

      The DSP Innovations codec manages decent speech quality at 600bps, god knows how (proprietary closed source). I think this the state-of-the-art in low bitrate codecs just now.

      --
      "I bless every day that I continue to live, for every day is pure profit."
    3. Re:Impressive. by Xamusk · · Score: 1

      I'm not so sure about that. MELPe is what I hear about the most at 600bps rates, specially since it's already used by military communications gear.

  17. Speech Rec on compressed stream? by hotdiggity · · Score: 1
    Hi Bruce - great work.

    I didn't dig completely into your site, but was just wondering if groups are doing work on speech recognition algorithms on your compressed bitstream? Is this an active area of research?

    1. Re:Speech Rec on compressed stream? by Bruce+Perens · · Score: 1

      The codec work is by David Rowe. I don't know of anyone doing speech recognition.

  18. Wonderful Name by schn · · Score: 1

    codec2, the... second codec.

    1. Re:Wonderful Name by Bruce+Perens · · Score: 1

      Right. The first is the one used in D-STAR.

    2. Re:Wonderful Name by Hatta · · Score: 2, Interesting

      But there it's called AMBE, not Codec. Codec2 is a bad name for a codec for the same reasons that Variable2 is a bad name for a variable. If this is supposed to supplant AMBE, why not AMBE2 or S(uper)AMBE?

      --
      Give me Classic Slashdot or give me death!
    3. Re:Wonderful Name by wowbagger · · Score: 1

      "If this is supposed to supplant AMBE, why not AMBE2 or S(uper)AMBE?"

      Because:
      1) AMBE is trademarked by DVSI, Inc. You would be infringing upon that and be sued into oblivion.
      2) This codec is not based on Multi-Band Excitation (IMBE - Improved Multi-Band Excitation. AMBE - Advanced Multi-Band Excitation). Naming it thus would be an error.

  19. SVN Repository by Anonymous Coward · · Score: 0

    What's up with the source code repository? I keep getting a variety of "repository temporarily relocated" messages when I try checking out the source.

  20. Merry "Kurisumasu" by tepples · · Score: 1

    the vocal tract is more or less the same for every human

    Different languages use different parts of the vocal tract. If a language distorts clicks, it won't pass Zulu or the Bushmen languages. Languages also make different distinctions on the parts of the vocal tract they do use. If a codec distorts pitches, it won't pass intelligible Cantonese, Yoruba, or Mandarin.

    There were differences in the range of sounds used by every language, but today that is not the case, thanks to communications advances. E.g Japanese people had incorporated external language sounds and it is not alien anymore.

    The only foreign sounds that have been fully assimilated into the phonology of Japanese are the 'y' compounds (e.g. "kyo", "hya", "chu" (phonemically "tyu")), borrowed a long time ago from a Chinese language. Otherwise, there's still a lot of rounding-to-the-nearest-phoneme that goes on: Merry "Kurisumasu". (If we killed Santa, would it be "Kurisumashita"?)

  21. Implementation? by Anonymous Coward · · Score: 0

    This post, and most of the comments, flew right over my head. I'm just wondering, is there any way someone could use this to make a ST:TNG-style commbadge? Or just some very long-range walkie-talkies?

    1. Re:Implementation? by drowe67 · · Score: 1

      Long range is possible. Low bit rate means more energy per bit, so less chance of a bit error over a given channel. So with s similar power output a 2400 bps codec has twice as much energy per bit than a 4800 bps codec. This can translate to longer range.

  22. How does it handle background noise? by wowbagger · · Score: 2, Interesting

    Bruce, have you guys done any testing of performance in the presence of background noise? I know that in the PMR area, there are a lot of firemen who are very unhappy with what happens to AMBE when their is background noise (e.g. saws, Personal Alert Safety System, fire) gets into the mike - while AMBE does ok at encoding just speech, throw the noise of a saw in the background and all you get is garbage.

    While the initial application of CODEC2 is hams in their shacks with their noise-canceling mikes, It Would Be Nice If the vocoder didn't curl up its toes and die in an noisy environment.

    See "Urgent Communications", September 10th edition, page 10, "Round 2 of digital radio fireground tests held", and the test plan.

    1. Re:How does it handle background noise? by jmv · · Score: 1

      I don't know how codec2 actually does, but noise is a fundamental problem for all low-bitrate codecs. One thing that can sometimes help is applying some (conservative) noise reduction on the input to reduce the effect of noise on the codec.

    2. Re:How does it handle background noise? by drowe67 · · Score: 1

      This is an area of codec2 I would like to work on. Can you please send me some sample files of fireman speech corrupted by background saw noise? This would be a good start. The good news is with an open source codec this problem can be addressed - with a closed source codec your are stuck.

    3. Re:How does it handle background noise? by wowbagger · · Score: 1

      I don't have the sample files, but the TIA test document I linked to in my previous post gives a lot of data on how they modeled the impairments - that might give you a start. If I happen to run across any of the samples (and can redistribute them) I'll let you know. You might also contact the TIA and see if you can get the WAV files from them.

      As an interesting experiment, you might try reproducing a test I did on the IMBE vocoder used in APCO-25 Phase 1: take the first 10 seconds of Kansas's "Carry On Wayward Son" - the acapella vocal harmony - and run that through. IMBE went nuts on the signal - it sounded like you were listening through a fan. I think the closely related frequency components of the various voices were too much for it to handle.

    4. Re:How does it handle background noise? by drowe67 · · Score: 1

      Yeah that makes sense - to get the high compression rates the assumption is made of a single voice. We fit a model around that. Break that assumption and you break the codec.

  23. Jump on quick - this could be the next twitter! by sootman · · Score: 2, Funny

    Who wants to be the first to make a web service based on this codec and 3.75-second messages? :-)

    --
    Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
    1. Re:Jump on quick - this could be the next twitter! by c++0xFF · · Score: 1

      At 4.5 letters per word, a text can hold about 29 non-abbreviated words. You'd have to speak at 464 words per minute to do that in 3.75 seconds. The world record is 595 wpm. Normal reading comprehension is in the range of 200-300 wpm.

      However, let's look at this from a different perspective.

      A non-abbreviated text message is about 4.8 bytes per word (Bpw?). At, say, 200 wpm speaking, this codec comes out to about 84 Bpw.

      Honestly, a 17x difference to go to audio is remarkable. Text is probably the most concise ways of communicating.

  24. Re:SuperPhone Baud 14400 for sale now! by bradgoodman · · Score: 1

    Actually - it runs at 2520 baud! (1050 bytes / 3.75 sec) * (9bits) = 2520 bps [One stop - no parity]

  25. Voice recognition by Anonymous Coward · · Score: 0

    Does anybody know if this has any application in voice recognition?

  26. Its a great start but not usable yet! by briankwest · · Score: 3, Informative

    I have been working on mod_codec2.c for FreeSWITCH, which is committed in a WIP module. The library for codec2 isn't a library at all just yet. I'm working with David and Bruce to make sure we can get a working libcodec2 in place ASAP so we have a real VoIP demo that people can compile, call and test against. /b

  27. Whats the point of ultra low bitrate codecs? by Anonymous Coward · · Score: 0

    I fail to see the point of this for realtime telephony applications.

    With current best generally avaliable speex codecs the per packet payload of actual information is not much bigger than the protocol header (On the order of 60-70 bytes total size per packet) it can't get any bigger without introducting unacceptable latency from RT delays.. (Computers have to wait for enough voice information to put it in a PACKET and send it out over the wire)

    Throw in IPv6 with current codecs and the IP protocol header actually becomes larger than the payload itself. It sounds great on paper and for some applications (streaming, circut switched / compatible carriers) it can be quite useful... for voice applications its kind of pointless without aggressive header compression. It seems to be just too far over the edge in terms of diminishing returns where the IP header rather than usable signal ALREADY dominates.

    1. Re:Whats the point of ultra low bitrate codecs? by firstnevyn · · Score: 1

      Assuming just for a second that this isn't a horrible troll...

      Low bitrate have many useful applications. ethernet style bandwidth isn't availible everywhere and where it is availible low bitrate codecs allow you to have more conversations in the same connection

      so for example community telco you could use the wifi links to trunk 100's of calls instead of a couple of dozen. more efficient use of bandwidth for meaningful communications is a worthy goal.

    2. Re:Whats the point of ultra low bitrate codecs? by drowe67 · · Score: 1

      Correct for VOIP, the main target for codec2 is digital radio where the same overheads don't apply. For VOIP there are some tricks - if you concatenate many channels in a single IP packet (say for trunking between sites) you could send 32 calls in the same bandwidth as a single 64 kbit/s channel. For Voice over 2.4GHz Wifi you could consider breaking 802.11. For example the minimum bit rate is currently 1 Mbit/s. That could carry 500 x 2000bps calls if the protocol was modified. As it's unlicensed spectrum this is possible and legal, like running cordless phones and toys on the same spectrum. Alternatively we could come up with a 2000 bps Wifi waveform and get a 26dB power advantage for longer range, non line of site etc.

  28. Slashdot Summary Problem, not Codec Problem by billstewart · · Score: 1

    I had the same reaction, given the slashdot summary, but if you read the actual web page it's 20ms samples. You still have the problem of how to wrap it in IP packets, if you're going to do that, which gets much more annoying on low bit rate codecs. Take your 51-bit sample, pad it to 7 bytes, add 20 bytes of UDP RTP headers, 20 bytes of IP headers, maybe some IPSEC for fun, etc., maybe some Ethernet headers.... Obviously if you're actually trying to run over a slow transmission system, you're more likely to just run it as raw bits, or at least raw bytes, and maybe use CSLIP or something.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  29. Codec2 Web Page Says 20ms samples by billstewart · · Score: 1

    20ms samples, 51 bits, 2550 bits/sec. Are you sure about the 40 frames/sec vs. 50? Maybe he's doing that to get it under 2400 bps?

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  30. FreeSWITCH now supports CODEC2 by briankwest · · Score: 1

    FreeSWITCH now has CODEC2 support just checked in. Please check it out! http://www.freeswitch.org/ /b

  31. Here's a slower machine... by phorm · · Score: 1

    Perhaps we need a "timing" section on the website? I tend to be more interested in low-power machines.

    ---

    1.5Ghz single-core C7 CPU (while some other stuff is running)

    time ./c2enc ../raw/hts1a.raw hts1a_c2.bit

    real 0m0.566s
    user 0m0.560s
    sys 0m0.004s

    time ./c2dec hts1a_c2.bit hts1a_c2.raw

    real 0m0.495s
    user 0m0.328s
    sys 0m0.004s

    ---

    And here's an Acer Apsire with an ATOM N450 (1.66Ghz single-core CPU, 512MB Cache, some other stuff is running)

    time ./c2enc ../raw/hts1a.raw hts1a_c2.bit

    real 0m0.451s
    user 0m0.420s
    sys 0m0.004s

    time ./c2dec hts1a_c2.bit hts1a_c2.raw

    real 0m0.251s
    user 0m0.244s
    sys 0m0.004s

    1. Re:Here's a slower machine... by drowe67 · · Score: 1

      I haven't even considered CPU load yet - remember this is just the alpha 0.1 release. Much can be done.

  32. I tried to help this project 3 months ago :-| by blach · · Score: 1

    I wrote to Bruce Perens and David Rowe the following email early July 2010. David Rowe responded, understandably, that his primary area of interest and knowledge is really just in the voice codec itself, and he had no specific comment regarding my proposals for modulation and FEC. Bruce Perens never replied to me, which was a disappointment. Perhaps my note never made it past his spam filter.

    So, since there is active discussion going on here (with many folks who know much more about signal engineering than me), I am posting the email its entirety, and I sincerely invite comment about my proposals regarding modulation and FEC (particularly to point out anything which is factually incorrect).

    Perhaps Bruce would be willing to comment as well.

    ====== email below ======

    Gentlemen,

    I am a newly licensed amateur radio operator, and have read with interest in the past weeks about different modes of digital radio. Having read about D-Star, I recognized the need for an open alternative to AMBE, and then was pleasantly surprised to have run across both of your codec2 project sites.

    I wanted to share some of my thoughts regarding this project, especially with respect to the longer-term design goals.

    -----
    VOICE CODEC

    I read an old post by David, either on his blog or in a list serve archive, where he asked whether it would be better to (1) release an alpha version of the codec NOW and risk turning people off with poor audio quality or (2) wait for better audio quality before the first release but risk people losing interest and drifting away. I favor early release for the reason that there would be more time for (parallel) development of other softwares around the codec, time to flesh out bugs in the protocol, on-air tests could be conducted, etc. Hardware could even be started in the meantime; the final codec version could later be loaded onto an FPGA.

    -----

    PROTOCOL DESIGN

    Although I certainly see the allure of keeping the existing modem and plugging in a daughterboard to swap AMBE to codec2 (ie, modding existing D*Star HT or mobile), I suggest that a redesign of the protocol altogether may be far more beneficial in the long run.

    Presently, as I understand it, D-Star DV allocates its 4800b/s as follows:
    + 2400 AMBE voice
    + 1200 of 2/3 FEC ( actually the voice + FEC are included together in a single frame)
    + 1200 data (no FEC!)
    (obviously header/overhead/sync is a part of the 1200 data)
    = GMSK modulated it occupies ca 6.25kHz.

    I would like to see it done this way instead:
    + header with id, routing?, and _specified division of voice and data_
    + (3200 voice) OR (3200 data) OR ( 2400 voice + 1200 data)
    + 1600 FEC for all of above
    = QPSK modulated it could occupy as little as 3kHz bandwidth (at eff 1.6 ; theoretical max for QPSK is 2 = 2.4kHz bw).

    -----

    OTHER THOUGHTS

    1. If we used QPSK modulation with spectral efficiency of 1.6, we could increase the total data rate to 9600bps while maintaing the same occupied bandwidth of ca. 6kHz.

    2. My proposed header specifies that the voice rate is [3200] or [2400] (or, if we bump the total rate to 9600, we could have a maximum of 6400 bps voice data rate with 2/3 FEC) ; the receiver would implement the correct decoding. I don't know how tolerant the current algorithms are of changing the codec bitrate midstream (Speex has VBR though, so I suppose it could work).

    3. No. 2, above, is important as data could be streamed intermittently, irrespective of whether or not a voice transmission is taking place.

    4. With good filters, 9600 bps could still be GMSK modulated and occupy a standard FM channel width of 12.5kHz but I suppose this is dodgy.

    5. I am not as familiar with the tradeoffs of GMSK v. QPSK (power efficiency, SNR, complexity?) and I'm sure there is a good reason GMSK was chosen for D-Star.

    6. FEC.
    a. D-Star includes no FEC on the header or data frames! The FEC is an integral part of AMBE.

  33. Awesome. I'm looking forward to seeing this in * by GNUALMAFUERTE · · Score: 1

    It looks like an Awesome addition to Asterisk. Of course, no other platforms will support it (ATAs, ip phones or Carriers), but for asterisk-to-asterisk traffic, it would fucking rock. IAX2 trunking + this codec = WIN.

    --
    WTF am I doing replying to an AC at 5 A.M on a Friday night?
  34. Nice to see someone else who actually gets this by pslam · · Score: 1

    I think you are not using the definition of latency that most in the field would use.

    Latency is how long it takes to process the data. Its a computer science type of thing. If you understand Knuth and his tape drive sorting examples, this is pretty obvious...

    For example, heres a nice, simple, hopelessly useless codec that has almost exactly 100 ms of latency:

    ...

    I've had great difficulty explaining this even to folks who write and maintain codecs, which is a fairly odd thing to encounter. There are fundamental design decisions about how you buffer, error correct and frame the data which can have a huge impact on latency. For example, a framing format which likes to bundle packets into pages will multiply latency by that amount. Using CRC/ECC creates a sequence point where all data up to there must be transmitted/received before it can be trusted to be decoded. (That last point is one of my biggest issues with Ogg's design, for example, which should really have separate CRC per packet, not page)

    The most fundamental of all design decisions is trading off latency vs quality. The most obvious case is buffering an entire stream before deciding on the coefficients to most efficiently compress it. At the other end, you encounter issues with short bursts that need higher bitrate: if you don't buffer at all, you're stuck with fixed frame size regardless of content.

    In fact, this is one of the reasons why broadcast video tends to be worse quality than stored/web content. They can't buffer up enough data to get better prediction.

    I do like the example of the entire-message PTT and I will use that to illustrate the concept in future :)

  35. some DSPs like floating-point by r00t · · Score: 1

    To pick an older one that I'm familiar with, consider the SHARC. Floating point is plenty fast. Assuming you don't take advantage of weird features available only to an assembly language programmer, floating point will be faster than fixed point. (there is special fixed-point hardware that the C compiler will not take advantage of)

    The PowerPC "G4" is nearly a DSP, especially if you ignore the MMU. There again, floating point is fast. You get a throughput of 4 floating-point operations (even fused-multiply-add) per cycle.

    The danger with floating point is that lots of DSP chips do badly with denormalized floating point numbers. These numbers map to zero or are slower. You can find that code suddenly runs slow when the audio is nearly silent.

  36. GStreamer plug-in? by wowbagger · · Score: 1

    Is there a GStreamer plug-in available? I had to use GStreamer to stream radio from my house to a museum where we were doing an Amateur Radio demonstration (noise floor at the museum was -70dBm, with interfering carriers near the HF bands of over -30dBm), and was using Speex to do it, but I had to really fight with it to get a low enough latency. If Codec2 were to be a GStreamer plug-in by January I'd be a happy guy (yes, I'd offer to do the work, but I am drowning at work right now...)