Open Source Codec Encodes Voice Into Only 700 Bits Per Second (rowetel.com)

← Back to Stories (view on slashdot.org)

Open Source Codec Encodes Voice Into Only 700 Bits Per Second (rowetel.com)

Posted by BeauHD on Friday January 13, 2017 @10:40AM from the sound-of-efficiency dept.

Longtime Slashdot reader Bruce Perens writes: David Rowe VK5DGR has been working on ultra-low-bandwidth digital voice codecs for years, and his latest quest has been to come up with a digital codec that would compete well with single-sideband modulation used by ham contesters to score the longest-distance communications using HF radio. A new codec records clear, but not hi-fi, voice in 700 bits per second -- that's 88 bytes per second. Connected to an already-existing Open Source digital modem, it might beat SSB. Obviously there are other uses for recording voice at ultra-low-bandwidth. Many smartphones could record your voice for your entire life using their existing storage. A single IP packet could carry 15 seconds of speech. Ultra-low-bandwidth codecs don't help conventional VoIP, though. The payload size for low-latency voice is only a few bytes, and the packet overhead will be at least 10 times that size.

26 of 128 comments (clear)

Min score:

Reason:

Sort:

Specific to English? by MichaelSmith · 2017-01-13 10:51 · Score: 5, Interesting

I wonder how it performs on tonal languages like Cantonese.

--
http://michaelsmith.id.au
1. Re:Specific to English? by jfdavis668 · 2017-01-13 11:22 · Score: 2
  
  It includes poorly translated Engrish subtitles.
2. Re:Specific to English? by Bruce+Perens · 2017-01-13 12:46 · Score: 4, Informative
  
  I recruited David to work on this because I felt that Amateur Radio operators should not be bound to any locked-down technology but should be able to tinker with all of their technology. At the same time, there is a similar controversy regarding closed codecs on the Internet.
  
  --
  Bruce Perens.
This issy awe so nudes by JoeyRox · 2017-01-13 10:51 · Score: 2

I've been way thing for a new cold deck for joyce recordings.
1. Re:This issy awe so nudes by Bruce+Perens · 2017-01-13 19:44 · Score: 2
  
  See my previous comment on this topic.
  
  --
  Bruce Perens.
How does it sound? by jandrese · 2017-01-13 10:56 · Score: 3, Interesting

That's starting to approach feeding the sentence into a speech to text system at one end and then sending the text over the air to be fed back into a text to speed converter.

--

I read the internet for the articles.
1. Re:How does it sound? by ezdiy · 2017-01-13 15:17 · Score: 2
  
  Look at the codec diagram - if you ignore the entropy coder, it largely resembles input filters of voicerecog systems - before feeding the NN input terminals, signal is decimated to extremely low bandwidth vectors with only the psychoacoustic essentials of human voice - quantized to very few dominating tones and their attack/release values. The NN model does the final step of "compressing" the result only by factor of around 100 into text. It is popularly conjenctured that compression is, in fact, a ML problem.
  
  Same is done with computer vision, before matching for features, the frequency space is filtered into a narrow band where the interesting stuff can be still observed.
Re:do what now by frovingslosh · 2017-01-13 10:59 · Score: 4, Informative

The samples don't sound great, and I really wonder how well it does trying to record a conversation rather than one person talking directly into a mic. Still, I would welcome the chance to try an app based on this to see if it could really record your day, although until I can test it I'm a disbeliever.

--
I'm an American. I love this country and the freedoms that we used to have.
Close by fahrbot-bot · 2017-01-13 11:11 · Score: 3, Funny

A new codec records clear, but not hi-fi, voice in 700 bits per second -- that's 88 bytes per second.
It's 87.5 bytes/s and it's that odd 1/2 byte that keeps it from being too fuzzy sounding for hi-fi.

--
It must have been something you assimilated. . . .
1. Re:Close by Bruce+Perens · 2017-01-13 19:42 · Score: 4, Informative
  
  Lots of people ask about this. If we did pure speech-to-text and text-to-speech, it would take about half the bandwidth but everybody would have the same synthesized voice. Once you start trying to add parameters to the synthesized voice such as pitch, speed, and tonality, those take as much bandwidth as we are using for the entire codec, because they are essentially the same parameters.
  
  --
  Bruce Perens.
More than just low storage by Excelcia · 2017-01-13 11:13 · Score: 4, Interesting

Encoding voice more efficiently has implications far exceeding the amount of storage space required to save it. There's a reason why the article is comparing the new codec to single sideband. When transmitting digital data over radio, it pretty much invariably (nowadays) means some sort of spread spectrum transmission. The fewer bits required per second means the less spectrum you are having to spread your signal over, this the more concentrated your signal is. A radio transmitter has a fixed power output, so if you are smearing that power over less band, then you have a stronger signal.
It is a testament to the amateur radio pioneers of the past that an analog radio transmission mode invented over a hundred years ago is, just now, being possibly rivaled in its efficiency.
"clear" is an exaggeration by Bryan+Ischo · 2017-01-13 11:15 · Score: 3, Informative

They're skirting the bottom edge of comprehensibility, the voice in the samples is by no means "clear". You have to focus very closely to understand that is being said much of the time, and even then, repeated listenings are sometimes necessary.
1. Re:"clear" is an exaggeration by msauve · 2017-01-13 11:27 · Score: 2
  
  "You have to focus very closely to understand that is being said much of the time, and even then, repeated listenings are sometimes necessary."
  
  You're describing all of the tech support calls I've had to make in the past few years.
  
  --
  "National Security is the chief cause of national insecurity." - Celine's First Law
2. Re:"clear" is an exaggeration by tlhIngan · 2017-01-13 11:46 · Score: 3, Interesting
  
  They're skirting the bottom edge of comprehensibility, the voice in the samples is by no means "clear". You have to focus very closely to understand that is being said much of the time, and even then, repeated listenings are sometimes necessary.
  In other words, it's being efficient.
  The brain has a very powerful voice and audio decoder. (In fact, the brain's wetware is so powerful to compensate for relatively poor sensors - but coupled with the power of the brain, they become much more powerful detection devices. The downside to the economy in hardware with powerful software combination is artifacting - though we usually call those things illusions).
  So the codec basically saves transmission bytes by making the brain do a lot of the signal recovery work.
  Of course, in Amateur Radio, SSB can be really bad and you have to do a lot of deciphering anyhow.
Re:Bandwidth? by dlleigh · 2017-01-13 11:22 · Score: 3, Interesting

To compute the channel capacity, you need to know the channel's signal-to-noise ratio as well as its bandwidth.
The Shannon channel capacity formula is: C = B * log_2(1 + SNR) where C is the channel's capacity in bits/second, B is its bandwidth in hertz, log_2 is the base-2 logarithm and SNR is the channel's signal-to-noise ratio.
If we assume an SNR of 48 dB for a reasonable POTS line, its capacity would be C = 3 kHz * log_2(1 + 48 dB) ~= 3000 * log_2(63097) which is almost 48,000 bits per second.
This is a theoretical limit that realizable systems can only approach, but never equal or exceed. A practical system would also use extra bits for forward error correction purposes; I doubt that this codec deals gracefully with bit errors.
For back-of-the-envelope purposes, assume you could use this codec to send a single voice signal in 700 Hz of bandwidth on a channel with low SNR, or you could send 60 voice signals over a regular POTS line.
Re:Yes, it can! by Bruce+Perens · 2017-01-13 11:34 · Score: 4, Interesting

Actually, our modems degrade gracefully. The least-protected bits go wrong with low bit-error rates, and the more protected bits survive. It takes a high bit error rate to kill it. So bit errors result in the speech being "off" but not dropping out.

--
Bruce Perens.
Re:The math seems off by Bruce+Perens · 2017-01-13 11:49 · Score: 2

I've been programming all day, and haven't said many words at all. There are people who talk for their entire work day, but they generally spend half their time listening and more processing something, so they may actually do 4 hours of speech or less in the work day. Most people don't really speak for more than a few hours per day.

--
Bruce Perens.
Codec source code by TypoNAM · 2017-01-13 12:19 · Score: 3, Informative

Here's a link to the current source code, as it wasn't straight forward to find: https://svn.code.sf.net/p/free...
Licensed under GNU LGPL v2.1.

--
This space is not for rent.
1. Re:Codec source code by jensend · 2017-01-13 16:31 · Score: 3, Informative
  
  The github mirror has a nicer interface.
17 U.S. Intelligence Agencies by Rick+Schumann · 2017-01-13 12:39 · Score: 3, Interesting

That's who'll be interested in technology like this. They could compress and store the conversations of every person in the U.S., 24/7/365, for decades, without having to upgrade their data storage capacity.

Just to show I'm not all gloom-and-doom: I'd think NASA, and private spaceflight companies like SpaceX, would be interested, since a low datarate for voice communications would be great, I'd think, for interplanetary distances. With higher datarates available you could have multiple conversations happening simultaneously.
Re: Yes, it can! by Bruce+Perens · 2017-01-13 12:50 · Score: 4, Informative

It's free software, not for sale.

--
Bruce Perens.
Clear? No by wonkey_monkey · 2017-01-13 12:58 · Score: 2

Those samples are anything but "clear." It's still impressive, given the compression ratio, but there's no need to go overboard. You wouldn't want to have to rely on your understanding of one of these samples

--
systemd is Roko's Basilisk.
Re: do what now by bugs2squash · 2017-01-13 13:41 · Score: 2

Or trump could yell at someone in a tweet

--
Nullius in verba
Pushing ever further into unintelligibility by jensend · 2017-01-13 16:01 · Score: 2

I guess it's impressive to get anything other than straight noise out of less than 1kbps. But I've wondered why Rowe hasn't focused more on quality at more moderate (e.g. 2-3kbps) bitrates rather than continuing to seek ways to trade away some quality for an ever lower bitrate. It's been a couple years since I tried it out and came to that conclusion; this looks like that trend has continued.
I couldn't get my encoded samples to sound nearly as good as the samples posted on the codec2 site. And it seemed like the second-lowest bitrate at the time (1400?) sounded essentially just as good as the highest (3200), which meant it wasn't making effective use of the additional bits. The quality jump between its highest mode and the lowest Opus mode (at 6kbps) was huge . (EVS would be a big jump over that.)
From what I understand, codec2's most prominent competition operates at 2.4kbps and up and sounds noticeably better at those rates than codec2 does.
Re:do what now by Lumpy · 2017-01-13 16:56 · Score: 4, Informative

It's not for recording.
It's for giving us Voice communication to MARS and back. If you have the ability to transmit voice over long distances using lower bandwidth, you can add in luxuries like checksums and redundant data so that when you send it a very long distance it arrive at the extreme distance away where your 10,000 watt transmission is weaker than a dollar store walkie talkie.
Ham radio is where most of the breakthroughs in communication happen. I can see this mode used to allow voice communication with mars astronauts. We already have PSK31 allowing a ham with 2.5 watts of power to transmit text messages around the globe easily.

--
Do not look at laser with remaining good eye.
Re: Darn typos making my post unreadable by Bruce+Perens · 2017-01-13 19:54 · Score: 2

I am very glad that fight is over. And as far as I can tell, we saved Amateur Radio entirely. It would have died in our lifetimes.

--
Bruce Perens.