Open Source Codec Encodes Voice Into Only 700 Bits Per Second (rowetel.com)
Longtime Slashdot reader Bruce Perens writes: David Rowe VK5DGR has been working on ultra-low-bandwidth digital voice codecs for years, and his latest quest has been to come up with a digital codec that would compete well with single-sideband modulation used by ham contesters to score the longest-distance communications using HF radio. A new codec records clear, but not hi-fi, voice in 700 bits per second -- that's 88 bytes per second. Connected to an already-existing Open Source digital modem, it might beat SSB. Obviously there are other uses for recording voice at ultra-low-bandwidth. Many smartphones could record your voice for your entire life using their existing storage. A single IP packet could carry 15 seconds of speech. Ultra-low-bandwidth codecs don't help conventional VoIP, though. The payload size for low-latency voice is only a few bytes, and the packet overhead will be at least 10 times that size.
"Many smartphones could record your voice for your entire life using their existing storage."
lol if u think anyone wants to listen to open source hippies' entire life conversations about trains, autism, and why women dont like them
It would be better to use those bytes to send text and have a synthesizer speak the words. I wouldn't use a recording of that quality for any purpose whatsoever.
Can this be used for two-way comms? conversion time from analog to the bitstream, across the net and converted back to voice, what's the delay?
Slashdot's name? When my compiler sees
I wonder how it performs on tonal languages like Cantonese.
http://michaelsmith.id.au
I've been way thing for a new cold deck for joyce recordings.
70 years * 365 days (roughly) * 24 * 60 * 60 * 88 bytes/sec / 1024 / 1024 / 1024 = 181GB
Is my math off or are they assuming such people will only have a 15 year life span?
15s/IP packet - this should lower operational cost for our government.
That's starting to approach feeding the sentence into a speech to text system at one end and then sending the text over the air to be fed back into a text to speed converter.
I read the internet for the articles.
Good old POTS had 3k of audio bandwidth. What is the bandwidth of this CODEC? It's hard to be impressed without knowing the details.
A new codec records clear, but not hi-fi, voice in 700 bits per second -- that's 88 bytes per second.
It's 87.5 bytes/s and it's that odd 1/2 byte that keeps it from being too fuzzy sounding for hi-fi.
It must have been something you assimilated. . . .
Encoding voice more efficiently has implications far exceeding the amount of storage space required to save it. There's a reason why the article is comparing the new codec to single sideband. When transmitting digital data over radio, it pretty much invariably (nowadays) means some sort of spread spectrum transmission. The fewer bits required per second means the less spectrum you are having to spread your signal over, this the more concentrated your signal is. A radio transmitter has a fixed power output, so if you are smearing that power over less band, then you have a stronger signal.
It is a testament to the amateur radio pioneers of the past that an analog radio transmission mode invented over a hundred years ago is, just now, being possibly rivaled in its efficiency.
They're skirting the bottom edge of comprehensibility, the voice in the samples is by no means "clear". You have to focus very closely to understand that is being said much of the time, and even then, repeated listenings are sometimes necessary.
What is the Mean Opinion Score(MOS) for the quality of the sound at 700Bps.
It needs to be better than Edison's first recording.
It m___ cer___ly c_n!
T__s is just th_ thing Telco_ and oth_r _____ prov___rs need to _ed__e usag_ and all__ more users __ lim_ted bandw__th circ__ts.
He__. C_n y__ call m_ bac_ on my house__one?
1984 eat your heart out!
" A single IP packet could carry 15 seconds of speec"
great
You can listen to samples on the site linked in the article.
I couldn't give it more than a MOS of 2, and some of the recordings were more like MOS of 1. I'd give Edison's recording a 2 to 3.
The British rail service has been doing that for about a hundred years.
I think this is great, but I also know this will bite us all in the butt. There's no reason for cell companies to not record conversations anymore if this kind of tech exists.
I remember using software called Roger Wilco (Windows-only I believe) around 2000 on my 56K dialup to chat in video games. The quality was good, at least better than a phone call. Nowadays it's annoying that Skype isn't as good as that one was, and now I have 3G.
This is a great development and it will help millions of people communicate. At least in the Australian outback but also in 3rd world areas.
Thumbs up!
A stream of sounds is difficult to parse. Converting it via various codecs won't change that or make it more useful. Converting the analog wave sounds into meaningful digital data (in the form of words as text, musical notation, specific fart parameters, a database of whale or bird calls, etc) is more helpful and efficient. Meaning can be extracted and/or analyzed. As someone else suggests, those can be converted back to a semblance of the original sequential stream of sounds (but why?).
If you are communicating with a person who has a particularly melodious voice, you may want to preserve the analog, but not the 88Bps version.
...omphaloskepsis often...
Do we finally have a 2400b mode? Would love to do digital but when existing FM transceivers. Due to HOA I can't (and yes have tried) do HF reliably.
. Define sqrt(x) as something really evil like (x / rand()), and bury it deep. Watch your coworkers go nuts.
Or 3TB for one year and 1000 people.
> [I] haven't said many words at all.
And this is from a guy who is famous largely for saying stuff!* Well known for talking about Morse code, talking about free software and open source, talking about Debian's principles, talking at conferences, probably talking to Congress ... and even you don't talk more than a few hours per week.
* and also of course for DOING a lot of things, including doing things like founding organizations - which requires a lot of talking.
Actually, that got me curious, what do you first / most really got your name out there, why do you start getting so much press attention? Busybox is important, of course, but you never hear the person who created grub mentioned in press, or the original author of glibc.
A couple of typos made that hard to read. Let me try again:
What do you think first / most really got your name out there?
Why did you start getting so much press attention, etc, compared to other people who also did important work?
Not that you aren't worth listening to. I'm not saying you don't "deserve" the attention or whatever. I'd just like to know your thoughts on how and why someone like yourself becomes a bit of a celebrity in the field.
Here's a link to the current source code, as it wasn't straight forward to find: https://svn.code.sf.net/p/free...
Licensed under GNU LGPL v2.1.
This space is not for rent.
would be a better description, instead of "clear".
That's who'll be interested in technology like this. They could compress and store the conversations of every person in the U.S., 24/7/365, for decades, without having to upgrade their data storage capacity.
Just to show I'm not all gloom-and-doom: I'd think NASA, and private spaceflight companies like SpaceX, would be interested, since a low datarate for voice communications would be great, I'd think, for interplanetary distances. With higher datarates available you could have multiple conversations happening simultaneously.
Those samples are anything but "clear." It's still impressive, given the compression ratio, but there's no need to go overboard. You wouldn't want to have to rely on your understanding of one of these samples
systemd is Roko's Basilisk.
Thanks for that. Sounds like I have a lot of work to do to become nerd famous. ;)
I just checked out your blog and found the bit about switching power supplies interesting. I knew about switching *regulators*, but didn't realize common power supplies could actually run on DC. I'll have to check your blog more often.
I wonder if Google could pair Codec 2 700c and RAISR (Rapid and Accurate Super Image Resolution) for YouTube videos that use even lower bandwidth than the 144p that exist already. Or, they could use the same technology to reduce the bandwidth necessary to stream 1080p/4k/8k videos and further embarrass the data capping ISPs.
With some people, it never starts to, either. Just look at Trump's word salad. He doesn't even make sense when he's awake. Can't imagine he makes sense when sleeptalking.
I guess it's impressive to get anything other than straight noise out of less than 1kbps. But I've wondered why Rowe hasn't focused more on quality at more moderate (e.g. 2-3kbps) bitrates rather than continuing to seek ways to trade away some quality for an ever lower bitrate. It's been a couple years since I tried it out and came to that conclusion; this looks like that trend has continued.
I couldn't get my encoded samples to sound nearly as good as the samples posted on the codec2 site. And it seemed like the second-lowest bitrate at the time (1400?) sounded essentially just as good as the highest (3200), which meant it wasn't making effective use of the additional bits. The quality jump between its highest mode and the lowest Opus mode (at 6kbps) was huge . (EVS would be a big jump over that.)
From what I understand, codec2's most prominent competition operates at 2.4kbps and up and sounds noticeably better at those rates than codec2 does.
The jump in intelligibility and voice quality going from 4kHz narrowband to 6kHz mediumband is big- probably bigger than going from mediumband to 20kHz fullband. The distinguishing features of many consonants are between 3.5 and 6 kHz.
Finding some way to take advantage of information beyond narrowband - even if not trying to encode much of it - could be a distinct advantage for a low bitrate codec over existing competition.
Homer Dudley had a working vocoder pre-WW2, which was used in the encrypted voice system SIGSALY.
From Wiki, this encoded voice into 12 signals, each with 6 levels (call it 2.5 bits) at 25 Hz. That's about 750 bits/s.
You can only usefully judge each sample the first time you hear it. After that, you're tainted. You can more easily understand the sample because you know what to expect; you've heard it before.
Narrow SSB itself is actually pretty awful. The goal should be to do much better. This includes F-S distinction, high female voices (perhaps in short supply for amateur radio), distinguishing different speakers, being kind to people with old failing ears, and handling unusual languages. (unusual: Cantonese, Vietnamese, the African click languages, and the Pacific Northwest consonant-loaded languages)
"bits for pitch" is what you need for Cantonese and Vietnamese.
"small enough interval" is more a concern for Spanish. Languages with relatively few distinct sounds tend to have longer words that are spoken at a faster pace.
What a weird summary:
The new codec isn't "competing with single-sideband modulation".
Normal SSB is unprocessed speech. So the codec is simply competing with natural speech.
The claim that SSB "is used by ham contesters to score the longest-distance communications using HF radio" is just plain wacky. So they use natural speech too talk to each other???
Even if this codec isn't perfect, the progress shows that by end of 2017, nobody in the world should be paying for voice calls anymore.
Jio, a new telecom in India has already made voice calls free for life on its VoLTE Network.
Now a 5 G Mobile Network will give you 100 Mb/s, so if a single telephony session requires 700 b/s (in one direction), then you could hook up 140,000 telephones to a single mobile and let them make simultaneous phone calls. So when a 'personal computer' is wasteful (it could serve 1000 users), a personal phone is even more so.
This call for an implementation on those ESP8266 and similar modules: ADC and DAC (or PWM if absent) to interface with headset and that codec to send voice over IP sparing most possible bandwidth for other data and/or degraded link conditions.
Also an Arduino or other cheap platform and a couple serial rf modules could be an interesting way to tinker with the protocol and explore applications.
Being opensource, Codec2 was an important piece in reserach projects like the one with secure voice over HDVoice (amrwb+). Yes, it has been proven possible to make a modem over HDVoice and transmit data. The target was to have encryoted voice over this channel. As transfer rates were low, the voice was encoded with codec2, afterwars encrypted and sent over the HDVoice modem.
Thanks Rowe, great work with Codec2!
Will this run on a 6502 or more importantly is this what bender uses?
The HOA may be in conflict with the law. They can't legally stop you from having an effective antenna. They may require that it not be neon green of course.