Open Source Codec Encodes Voice Into Only 700 Bits Per Second (rowetel.com)

do what now by Anonymous Coward · 2017-01-13 10:45 · Score: 0, Insightful

"Many smartphones could record your voice for your entire life using their existing storage."

lol if u think anyone wants to listen to open source hippies' entire life conversations about trains, autism, and why women dont like them

Re:do what now by Anonymous Coward · 2017-01-13 10:52 · Score: 0

forget recording on local storage - you could send about a second and a half of recording in an sms. that's pretty impressive.
Re:do what now by frovingslosh · 2017-01-13 10:59 · Score: 4, Informative

The samples don't sound great, and I really wonder how well it does trying to record a conversation rather than one person talking directly into a mic. Still, I would welcome the chance to try an app based on this to see if it could really record your day, although until I can test it I'm a disbeliever.

--
I'm an American. I love this country and the freedoms that we used to have.
Re:do what now by Anonymous Coward · 2017-01-13 11:05 · Score: 1

May be boring, but it's great for espionage!
Re:do what now by anarcobra · 2017-01-13 12:31 · Score: 1

Also a bit optimistic about battery life.
More likely the NSA can use this to store everything you say forever.
Re: do what now by bugs2squash · 2017-01-13 13:41 · Score: 2

Or trump could yell at someone in a tweet

--
Nullius in verba
Re: do what now by Anonymous Coward · 2017-01-13 14:28 · Score: 0

Exactly! I'm glad you realized that no one wants to listen to you either.
Re:do what now by arglebargle_xiv · 2017-01-13 16:43 · Score: 1

They're not that bad. It's a human voice codec, and for more than half the samples I could tell that I was probably listening to a human voice. All you'd need to add is subtitles so you can tell what's being said and it'd be pretty good.
Re:do what now by arglebargle_xiv · 2017-01-13 16:53 · Score: 1

And now for a less facetious reply: I've seen something like this before when working on audio processing algorithms that are evaluated subjectively, after hearing the same type of audio sample for the millionth time in a row, and having heard 900,000 less-good versions, you start to think that it's sounding pretty decent. It isn't until you either measure it objectively or find a fresh test subject who's never listened to any of the previous attempts to listen to it and provide a subjective rating that you realise it's actually not so good. There's a technical name for this problem which escapes me at the moment... anyone?
Re:do what now by Lumpy · 2017-01-13 16:56 · Score: 4, Informative

It's not for recording.
It's for giving us Voice communication to MARS and back. If you have the ability to transmit voice over long distances using lower bandwidth, you can add in luxuries like checksums and redundant data so that when you send it a very long distance it arrive at the extreme distance away where your 10,000 watt transmission is weaker than a dollar store walkie talkie.
Ham radio is where most of the breakthroughs in communication happen. I can see this mode used to allow voice communication with mars astronauts. We already have PSK31 allowing a ham with 2.5 watts of power to transmit text messages around the globe easily.

--
Do not look at laser with remaining good eye.
Re:do what now by Anonymous Coward · 2017-01-13 17:35 · Score: 0

"smartphones could record your voice for your entire life using their existing storage."
My life isn't that short.
Re:do what now by Anonymous Coward · 2017-01-13 20:55 · Score: 1

if you need subtitles then it is by definition not fucking pretty good.
Re:do what now by Anonymous Coward · 2017-01-13 21:02 · Score: 0

Or relay your conversations to big brother in real time.
Re:do what now by smallfries · 2017-01-13 22:52 · Score: 1

Is it a form of adaption? I couldn't understand the 700C samples on the first few play throughs, but after 5 repetitions they made sense.

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php

not great quality by Anonymous Coward · 2017-01-13 10:48 · Score: 0

It would be better to use those bytes to send text and have a synthesizer speak the words. I wouldn't use a recording of that quality for any purpose whatsoever.

Re: not great quality by Anonymous Coward · 2017-01-13 10:53 · Score: 1

VTT is a trivially solved problem, is it? Especially at low latency on embedded devices?
The words you're looking for are "I'm sorry"

Latency? by LordByronStyrofoam · 2017-01-13 10:49 · Score: 1

Can this be used for two-way comms? conversion time from analog to the bitstream, across the net and converted back to voice, what's the delay?

--
Slashdot's name? When my compiler sees /. it generates a warning about a badly formed comment.

Specific to English? by MichaelSmith · 2017-01-13 10:51 · Score: 5, Interesting

I wonder how it performs on tonal languages like Cantonese.

--
http://michaelsmith.id.au

Re:Specific to English? by Stele · 2017-01-13 10:53 · Score: 1

Or more importantly, atonal languages like Klingon!
Re:Specific to English? by jfdavis668 · 2017-01-13 11:22 · Score: 2

It includes poorly translated Engrish subtitles.
Re:Specific to English? by Bruce+Perens · 2017-01-13 11:30 · Score: 1

You can try it pretty easily, if you speak such a language. There are test programs that work on sound files.

--
Bruce Perens.
Re:Specific to English? by Bruce+Perens · 2017-01-13 11:31 · Score: 1

The only question would probably be whether we had allocated enough bits for pitch and collected it over a small enough interval.. Pitch is definitely encoded.

--
Bruce Perens.
Re:Specific to English? by MichaelSmith · 2017-01-13 11:44 · Score: 1

Is this project a response to the earlier controversy about proprietary codecs?

--
http://michaelsmith.id.au
Re:Specific to English? by R.Mo_Robert · 2017-01-13 12:41 · Score: 1

I wonder how it performs on tonal languages like Cantonese.
I don't see any reason it shouldn't work. It encodes pitch (you really can't avoid that if you're encoding speech, which will include "voiced" sounds that have a fundamental frequency), and some casual reading about how it encodes suggest that it captures more specific information in the lower frequencies than in the higher ones, which also matches how our (logarithmic) perception of frequency works. That being said, the English sample I heard doesn't sound fantastic: think of a phone conversation in which /f/ is difficult to distinguish from /s/, which I suspect has to do with the high frequencies being either cut off or difficult to distinguish in terms of amplitude (/f/ is a bit weaker in general, and I think most of its noise is concentrated above the frequencies that aren't heard over the phone--don't quote me on this). So, I suspect the listener will have to do some work regardless of language, but there is nothing English-specific here.

--
R.Mo
Re:Specific to English? by Bruce+Perens · 2017-01-13 12:46 · Score: 4, Informative

I recruited David to work on this because I felt that Amateur Radio operators should not be bound to any locked-down technology but should be able to tinker with all of their technology. At the same time, there is a similar controversy regarding closed codecs on the Internet.

--
Bruce Perens.

This issy awe so nudes by JoeyRox · 2017-01-13 10:51 · Score: 2

I've been way thing for a new cold deck for joyce recordings.

Re:This issy awe so nudes by Anonymous Coward · 2017-01-13 18:50 · Score: 0

It's not voice recognition software...
That gives me an idea. Do voice recognition on a hi-fi source and send the text along with the 700 bits. Then do text-to-speech sync'd to the audio, and use it to generate a speech signal with the quality of the encoded sound.
You could run voice recognition on the encoded sound, and send text only when the voice recognition fails, and only modify the output in those cases. I have no idea how hard it is to modify a voice to correct for word sounds. Or possibly just flip bits of the original 700 until voice recognition passes, thereby distorting the original sound toward recognisable words.
Re:This issy awe so nudes by Bruce+Perens · 2017-01-13 19:44 · Score: 2

See my previous comment on this topic.

--
Bruce Perens.

The math seems off by sobachatina · 2017-01-13 10:53 · Score: 1

70 years * 365 days (roughly) * 24 * 60 * 60 * 88 bytes/sec / 1024 / 1024 / 1024 = 181GB

Is my math off or are they assuming such people will only have a 15 year life span?

Re:The math seems off by darkain · 2017-01-13 11:12 · Score: 1

There are 256GiB MicroSD cards on the market right now. So yes, this is entirely possible.
Re:The math seems off by Bryan+Ischo · 2017-01-13 11:13 · Score: 1

Nobody is assuming a 15 year life span.
The question is, why do you assume that people talk nonstop 24 hours per day?
Re:The math seems off by networkBoy · 2017-01-13 11:13 · Score: 1

I got the same as you. 2.59GB/year
Still damn impressive as 250GB m2 SSDs would hold ~ a century of voice.
Now, assuming that you are not talking continuously (say you talk 1/3 of the day; 8 hours of continuous talking; that's a lot) then you're at 60 GB/70Yr and that *is* valid for a high(ish) end smartphone.

--
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
Re:The math seems off by Anonymous Coward · 2017-01-13 11:14 · Score: 0

Looks good other than average person does not talk their entire life without breaks.
Re: The math seems off by Anonymous Coward · 2017-01-13 11:17 · Score: 0

Considering the demographics of the people likely to use this, yes. Do any fat anime watching computer janitors make it 15 years of their adult life without dying of obesity related diseases or getting arrested for child porn?
Re:The math seems off by MichaelSmith · 2017-01-13 11:17 · Score: 1

MicroSD capacity should increase faster than the rate data is added to the device.

--
http://michaelsmith.id.au
Re:The math seems off by Bruce+Perens · 2017-01-13 11:37 · Score: 1

You don't record the pauses. You do sleep, you know :-)

--
Bruce Perens.
Re:The math seems off by Bruce+Perens · 2017-01-13 11:49 · Score: 2

I've been programming all day, and haven't said many words at all. There are people who talk for their entire work day, but they generally spend half their time listening and more processing something, so they may actually do 4 hours of speech or less in the work day. Most people don't really speak for more than a few hours per day.

--
Bruce Perens.
Re:The math seems off by Cramer · 2017-01-13 12:03 · Score: 1

Only if that SD card were used EXCLUSIVELY for recording your voice, and it's ACTUALLY 256GB of usable space (capacity is always a lie, filesystems take up space too, etc.), and it doesn't fail over the decades, AND you don't live more than ~98 years, sure.
Re:The math seems off by Cramer · 2017-01-13 12:04 · Score: 1

People talk in their sleep, you know.
Re:The math seems off by Bruce+Perens · 2017-01-13 13:06 · Score: 1

Yes, but it's hardly continuous and, going by my kid when he was younger, rarely makes any sense.

--
Bruce Perens.
Re: The math seems off by Anonymous Coward · 2017-01-14 16:02 · Score: 0

You haven't met my ex mother in law

budget cuts for NSA? by kiviQr · 2017-01-13 10:54 · Score: 1

15s/IP packet - this should lower operational cost for our government.

How does it sound? by jandrese · 2017-01-13 10:56 · Score: 3, Interesting

That's starting to approach feeding the sentence into a speech to text system at one end and then sending the text over the air to be fed back into a text to speed converter.

--

I read the internet for the articles.

Re:How does it sound? by Anonymous Coward · 2017-01-13 11:14 · Score: 1

It's right there in TFA (samples that is). The answer appears to vary from muffled but understandable if you listen closely to bad-phone-connection, breaking up level of unintelligability. It's impressive but not really something you'd want to listen to if there was an alternative.
Re:How does it sound? by networkBoy · 2017-01-13 11:17 · Score: 1

good point. I suppose the low limit would be doing that while compressing the text stream via a pre-shared library and assuming optimum (no ECC required) communication channel?

--
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
Re:How does it sound? by ezdiy · 2017-01-13 15:17 · Score: 2

Look at the codec diagram - if you ignore the entropy coder, it largely resembles input filters of voicerecog systems - before feeding the NN input terminals, signal is decimated to extremely low bandwidth vectors with only the psychoacoustic essentials of human voice - quantized to very few dominating tones and their attack/release values. The NN model does the final step of "compressing" the result only by factor of around 100 into text. It is popularly conjenctured that compression is, in fact, a ML problem.

Same is done with computer vision, before matching for features, the frequency space is filtered into a narrow band where the interesting stuff can be still observed.
Re:How does it sound? by Anonymous Coward · 2017-01-14 01:48 · Score: 0

Like shite. I'm not impressed. If his goal was intelligibility he failed. 60% or less on that score is a fail...epic fail...

Bandwidth? by Anonymous Coward · 2017-01-13 10:56 · Score: 1

Good old POTS had 3k of audio bandwidth. What is the bandwidth of this CODEC? It's hard to be impressed without knowing the details.

Re:Bandwidth? by dlleigh · 2017-01-13 11:22 · Score: 3, Interesting

To compute the channel capacity, you need to know the channel's signal-to-noise ratio as well as its bandwidth.
The Shannon channel capacity formula is: C = B * log_2(1 + SNR) where C is the channel's capacity in bits/second, B is its bandwidth in hertz, log_2 is the base-2 logarithm and SNR is the channel's signal-to-noise ratio.
If we assume an SNR of 48 dB for a reasonable POTS line, its capacity would be C = 3 kHz * log_2(1 + 48 dB) ~= 3000 * log_2(63097) which is almost 48,000 bits per second.
This is a theoretical limit that realizable systems can only approach, but never equal or exceed. A practical system would also use extra bits for forward error correction purposes; I doubt that this codec deals gracefully with bit errors.
For back-of-the-envelope purposes, assume you could use this codec to send a single voice signal in 700 Hz of bandwidth on a channel with low SNR, or you could send 60 voice signals over a regular POTS line.
Re:Bandwidth? by Bruce+Perens · 2017-01-13 13:13 · Score: 1

Actually, the modem does deal gracefully with bit errors. It protects the most important bits and lets the less important ones get clobbered. In a high bit error situation you get speech that sounds wrong but can still be understood. FEC actually falls down sooner than this scheme.

--
Bruce Perens.
Re:Bandwidth? by dlleigh · 2017-01-13 13:23 · Score: 1

If that's true, there's more room for compression!
Re:Bandwidth? by Anonymous Coward · 2017-01-13 13:42 · Score: 0

If the signal doesn't get through, it doesn't get through. Importance of bits is irrelevant
Re:Bandwidth? by Anonymous Coward · 2017-01-13 17:28 · Score: 0

For voice audio, bit errors are easily forgiven by the brain.
Threre's a reson RT traffic is UDP, getting as much to the other end is more important than losing a little.
Re:Bandwidth? by gravewax · 2017-01-13 20:58 · Score: 1

just couple the codec with gold plated monster cables that will eliminate those bit errors.
Re:Bandwidth? by hackertourist · 2017-01-13 21:33 · Score: 1

POTS is traditionally converted to a 64 kbit/s digital signal, e.g. in ISDN, but also in the digital back-end used for the POTS network these days.

Close by fahrbot-bot · 2017-01-13 11:11 · Score: 3, Funny

A new codec records clear, but not hi-fi, voice in 700 bits per second -- that's 88 bytes per second.

It's 87.5 bytes/s and it's that odd 1/2 byte that keeps it from being too fuzzy sounding for hi-fi.

--
It must have been something you assimilated. . . .

Re:Close by Citizen+of+Earth · 2017-01-13 15:42 · Score: 1

How low could they make the bit rate if they made their system parse phonemes, transmit only them, and reproduce them on the other side?
Re:Close by Bruce+Perens · 2017-01-13 19:42 · Score: 4, Informative

Lots of people ask about this. If we did pure speech-to-text and text-to-speech, it would take about half the bandwidth but everybody would have the same synthesized voice. Once you start trying to add parameters to the synthesized voice such as pitch, speed, and tonality, those take as much bandwidth as we are using for the entire codec, because they are essentially the same parameters.

--
Bruce Perens.
Re:Close by smallfries · 2017-01-13 22:56 · Score: 1

It sounds strange in our digital world based on whole bytes, but those odd half-byte encode naturally onto vinyl and add warmth and feeling to the intonation.

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Re:Close by Anonymous Coward · 2017-01-13 23:26 · Score: 0

this, and the fact they did not use Monster cables.
Re:Close by Ol+Olsoc · 2017-01-14 03:28 · Score: 1

Lots of people ask about this. If we did pure speech-to-text and text-to-speech, it would take about half the bandwidth but everybody would have the same synthesized voice. Once you start trying to add parameters to the synthesized voice such as pitch, speed, and tonality, those take as much bandwidth as we are using for the entire codec, because they are essentially the same parameters.
Doesn't Motorola have a low bandwidth FM mode using phonemes? I've listened to a few radios using something like that, and they are pretty unpleasant to use.

--
The shepherds did so well protecting the flock that the sheep no longer believed that wolves existed.
Re:Close by yes-but-no · 2017-01-14 03:42 · Score: 1

When input is voice; we assume it's a human generated voice. A specific human's sound generating apparatus (vocal chord etc) have a specific signature (common parlance call it accent); If the software can capture this and send it along, you can reasonably construct back in the text-to-speech part something resembling/unique to the original voice. And this info is independent of the size of the sample - whether he/she talks 10 words or a thousand, the accent part info stays the same.
Re:Close by religionofpeas · 2017-01-14 04:11 · Score: 1

Or you could leave out the text-to-speech part, and just let the other person read it. Much faster, and you can grep it.

More than just low storage by Excelcia · 2017-01-13 11:13 · Score: 4, Interesting

Encoding voice more efficiently has implications far exceeding the amount of storage space required to save it. There's a reason why the article is comparing the new codec to single sideband. When transmitting digital data over radio, it pretty much invariably (nowadays) means some sort of spread spectrum transmission. The fewer bits required per second means the less spectrum you are having to spread your signal over, this the more concentrated your signal is. A radio transmitter has a fixed power output, so if you are smearing that power over less band, then you have a stronger signal.

It is a testament to the amateur radio pioneers of the past that an analog radio transmission mode invented over a hundred years ago is, just now, being possibly rivaled in its efficiency.

Re:More than just low storage by Ol+Olsoc · 2017-01-14 03:18 · Score: 1

It is a testament to the amateur radio pioneers of the past that an analog radio transmission mode invented over a hundred years ago is, just now, being possibly rivaled in its efficiency.
And there is a reason why Single Sideband will still be used for a long time to come.
A weak or noisy SSB signal can still be copied and understood. The digital encodings have a fatal flaw. It is known as the "digital cliff". If it doesn't decode properly, or if you have a weak or noisy signal, you get silence.
So the net effect is a quiet signal of significantly less range. In addition, most digital encoding schemes don't really save any bandwidth.
This encoding appears to try to work around that issue by guessing at missing bits and placing the guess' in the stream. So I suspect that some extra range will be gained before it drops off. A big question will be if this gain is achieved at the expense of a now noisy signal, where the noise is purposely injected by the codec. In the real world, will this injection end make for illegible signals? Dunno - I'll probably try some of this using FreeDV before being too tough on it.
This isn't to say people shouldn't try. But as you note, the gold standard SSB isn't in any danger yet. On my radio I can knock the Sideband Bandwidth to around 2 KHz and even less, so the 3 KHz standard they are aiming for is kind of a moving target.

--
The shepherds did so well protecting the flock that the sheep no longer believed that wolves existed.

"clear" is an exaggeration by Bryan+Ischo · 2017-01-13 11:15 · Score: 3, Informative

They're skirting the bottom edge of comprehensibility, the voice in the samples is by no means "clear". You have to focus very closely to understand that is being said much of the time, and even then, repeated listenings are sometimes necessary.

Re:"clear" is an exaggeration by MichaelSmith · 2017-01-13 11:23 · Score: 1

Though thats often true of amateur radio generally.

--
http://michaelsmith.id.au
Re:"clear" is an exaggeration by Anonymous Coward · 2017-01-13 11:23 · Score: 0

You'll have to type louder as I can't hear you over the sound of moving bits.
Re:"clear" is an exaggeration by msauve · 2017-01-13 11:27 · Score: 2

"You have to focus very closely to understand that is being said much of the time, and even then, repeated listenings are sometimes necessary."

You're describing all of the tech support calls I've had to make in the past few years.

--
"National Security is the chief cause of national insecurity." - Celine's First Law
Re:"clear" is an exaggeration by Anonymous Coward · 2017-01-13 11:34 · Score: 0

Yes. I had trouble understanding all of 700bps samples until I listened to the 1300bps equivalents, which made them clear. This new algorithm might produce sound quality better than 7/13th the old one in 7/13th of the bits, but to my ear it's not good enough.
Re:"clear" is an exaggeration by tlhIngan · 2017-01-13 11:46 · Score: 3, Interesting

They're skirting the bottom edge of comprehensibility, the voice in the samples is by no means "clear". You have to focus very closely to understand that is being said much of the time, and even then, repeated listenings are sometimes necessary.
In other words, it's being efficient.
The brain has a very powerful voice and audio decoder. (In fact, the brain's wetware is so powerful to compensate for relatively poor sensors - but coupled with the power of the brain, they become much more powerful detection devices. The downside to the economy in hardware with powerful software combination is artifacting - though we usually call those things illusions).
So the codec basically saves transmission bytes by making the brain do a lot of the signal recovery work.
Of course, in Amateur Radio, SSB can be really bad and you have to do a lot of deciphering anyhow.
Re:"clear" is an exaggeration by Anonymous Coward · 2017-01-13 12:18 · Score: 0

Of the provided samples, for me the sound was only clear enough to understand what was being said in samples ‘forig’, ‘mmt1’ and partially ‘vk5qi’ (both files), so that gives us a score of about 2½ out of 7. Some of the other ones I could partially make out on repeated listening, but still, I think this would have been a more accurate headline:
Open Source Codec Doesn't Quite Encode Voice Into 700 Bits Per Second
Re:"clear" is an exaggeration by Bruce+Perens · 2017-01-13 12:55 · Score: 1

That's the theory. The modem also degrades gracefully in a way that lets you use your "ears" to recover information when there are bit errors. No on-off behavior like most digital codecs, in fact one of the samples is rendered with 1% bit errors, which might kill a normal codec or at least require a packet repeat. We have higher bit rate versions of the codec that don't make you work so hard.

--
Bruce Perens.
Re:"clear" is an exaggeration by Anonymous Coward · 2017-01-13 18:18 · Score: 0

I always figure that was by design to discourage use of tech support except when absolutely necessary.
Re:"clear" is an exaggeration by Bryan+Ischo · 2017-01-13 20:09 · Score: 1

I am sure the tech is very useful, and being able to transmit understandable voice (even if it takes some concentration to understand it) in a very low number of bits is cool. I just thought the slashdot summary exaggerated a little bit.
Re: "clear" is an exaggeration by Bruce+Perens · 2017-01-13 21:19 · Score: 1

All the hams I spoke with this evening are wondering why you find it difficult to copy. No kidding. We seem to have trained our ears on the analog radios over marginal paths.

--
Bruce Perens.
Re:"clear" is an exaggeration by Anonymous Coward · 2017-01-14 02:15 · Score: 0

The guy was using phonetics to give his callsign, which is a type of negative compression in the voice domain. So when you do your analysis of how many "bytes per character" be sure to compare the eight-byte "V" with the 0.67-second requirement of "V[ictor]" which is more like 59 bytes. So this is what, 7x worse than Twitter? psssh.
I was going to say LOL but I compressed it down by 300% and will just say L
BYKWIM
Re: "clear" is an exaggeration by Ol+Olsoc · 2017-01-14 03:39 · Score: 1

All the hams I spoke with this evening are wondering why you find it difficult to copy. No kidding. We seem to have trained our ears on the analog radios over marginal paths.
It is a training thing. I am pretty deaf, with two separate tinnitus tones, what does get to my brain sounds like a cracked speaker, and tremendous loss above 2 KHz, yet I am able to hear a lot of transmissions that inexperienced people with good hearing cannot. This is proven time and again when contesting with a noob helper.
The issue I find with low bandwidth signals is that they cause fatigue over time. It's like when I wear a hearing aid. After 20 minutes, I'm ready to scream - This is likely because people with my issue have trouble separating the intelligence from the noise.

--
The shepherds did so well protecting the flock that the sheep no longer believed that wolves existed.

What's The MOS by Anonymous Coward · 2017-01-13 11:20 · Score: 0

What is the Mean Opinion Score(MOS) for the quality of the sound at 700Bps.

It needs to be better than Edison's first recording.

Yes, it can! by Okian+Warrior · 2017-01-13 11:24 · Score: 1

It m___ cer___ly c_n!

T__s is just th_ thing Telco_ and oth_r _____ prov___rs need to _ed__e usag_ and all__ more users __ lim_ted bandw__th circ__ts.

He__. C_n y__ call m_ bac_ on my house__one?

Re:Yes, it can! by Bruce+Perens · 2017-01-13 11:34 · Score: 4, Interesting

Actually, our modems degrade gracefully. The least-protected bits go wrong with low bit-error rates, and the more protected bits survive. It takes a high bit error rate to kill it. So bit errors result in the speech being "off" but not dropping out.

--
Bruce Perens.
Re: Yes, it can! by Bruce+Perens · 2017-01-13 12:50 · Score: 4, Informative

It's free software, not for sale.

--
Bruce Perens.

Intelligence agencies everywhere rejoice! by Anonymous Coward · 2017-01-13 11:29 · Score: 0

1984 eat your heart out!

drip to amazon/apple/google by zlives · 2017-01-13 11:29 · Score: 1

" A single IP packet could carry 15 seconds of speec"

great

I Give It A 2 by Anonymous Coward · 2017-01-13 11:34 · Score: 0

You can listen to samples on the site linked in the article.

I couldn't give it more than a MOS of 2, and some of the recordings were more like MOS of 1. I'd give Edison's recording a 2 to 3.

Big Deal by Anonymous Coward · 2017-01-13 11:40 · Score: 0

The British rail service has been doing that for about a hundred years.

I love this, but... by Anonymous Coward · 2017-01-13 11:44 · Score: 0

I think this is great, but I also know this will bite us all in the butt. There's no reason for cell companies to not record conversations anymore if this kind of tech exists.

Roger Wilco by Anonymous Coward · 2017-01-13 11:45 · Score: 0

I remember using software called Roger Wilco (Windows-only I believe) around 2000 on my 56K dialup to chat in video games. The quality was good, at least better than a phone call. Nowadays it's annoying that Skype isn't as good as that one was, and now I have 3G.

This is a great development and it will help millions of people communicate. At least in the Australian outback but also in 3rd world areas.

Thumbs up!

sequential access vs random access by swell · 2017-01-13 11:52 · Score: 1

A stream of sounds is difficult to parse. Converting it via various codecs won't change that or make it more useful. Converting the analog wave sounds into meaningful digital data (in the form of words as text, musical notation, specific fart parameters, a database of whale or bird calls, etc) is more helpful and efficient. Meaning can be extracted and/or analyzed. As someone else suggests, those can be converted back to a semblance of the original sequential stream of sounds (but why?).

If you are communicating with a person who has a particularly melodious voice, you may want to preserve the analog, but not the 88Bps version.

--
...omphaloskepsis often...

Re:sequential access vs random access by Bruce+Perens · 2017-01-13 19:47 · Score: 1

This is not, however, a waveform codec. It models the human voice tract, and encodes the parameters of that, rather than any waveform.

--
Bruce Perens.

FM mode? by cdwiegand · 2017-01-13 12:03 · Score: 1

Do we finally have a 2400b mode? Would love to do digital but when existing FM transceivers. Due to HOA I can't (and yes have tried) do HF reliably.

--
. Define sqrt(x) as something really evil like (x / rand()), and bury it deep. Watch your coworkers go nuts.

Re:FM mode? by Anonymous Coward · 2017-01-13 15:11 · Score: 0

Would love to do digital but when existing FM transceivers.
This is not a sentence.

HOA
I've no idea what this means and am sure I'm not the only one.
fuckin' amateurs.
Re: FM mode? by Anonymous Coward · 2017-01-13 16:52 · Score: 0

ARPA may pass next year, so HOA may not be a problem. FCC still limits HF to 300baud, but many modes provide a decent bitrate.
Re:FM mode? by pe1rxq · 2017-01-14 14:09 · Score: 1

I have been experimenting with 2400b on UHF for almost a year now. Especially since it allowes mixed voice and data.

--
Secure messaging: http://quickmsg.vreeken.net/

3TB for a thousand years by Anonymous Coward · 2017-01-13 12:07 · Score: 0

Or 3TB for one year and 1000 people.

This from a guy famous for saying stuff! by raymorris · 2017-01-13 12:14 · Score: 1

> [I] haven't said many words at all.

And this is from a guy who is famous largely for saying stuff!* Well known for talking about Morse code, talking about free software and open source, talking about Debian's principles, talking at conferences, probably talking to Congress ... and even you don't talk more than a few hours per week.

* and also of course for DOING a lot of things, including doing things like founding organizations - which requires a lot of talking.

Actually, that got me curious, what do you first / most really got your name out there, why do you start getting so much press attention? Busybox is important, of course, but you never hear the person who created grub mentioned in press, or the original author of glibc.

Darn typos making my post unreadable by raymorris · 2017-01-13 12:19 · Score: 1

A couple of typos made that hard to read. Let me try again:

What do you think first / most really got your name out there?
Why did you start getting so much press attention, etc, compared to other people who also did important work?

Not that you aren't worth listening to. I'm not saying you don't "deserve" the attention or whatever. I'd just like to know your thoughts on how and why someone like yourself becomes a bit of a celebrity in the field.

Re:Darn typos making my post unreadable by Bruce+Perens · 2017-01-13 13:04 · Score: 1

Being at Pixar, being Debian project leader, my technical work on Debian, and announcing Open Source. Those things interested a lot of people. And founding No-Code International stirred up a lot of controversy in the radio amateur world.

--
Bruce Perens.
Re: Darn typos making my post unreadable by Anonymous Coward · 2017-01-13 14:40 · Score: 0

Thanks for the no-code thing. Seriously. More of us "No Code Era" guys use CW than there are old Elmers. Now it's a fun challenge and not a maddening barrier.
Re: Darn typos making my post unreadable by Bruce+Perens · 2017-01-13 19:54 · Score: 2

I am very glad that fight is over. And as far as I can tell, we saved Amateur Radio entirely. It would have died in our lifetimes.

--
Bruce Perens.
Re: Darn typos making my post unreadable by AndroSyn · 2017-01-14 12:39 · Score: 1

As a no-code general, thank you again for all of your hard work on getting that pushed through. I briefly ran into you in Dayton back in 2012 when you were handing out codec2 flyers. I sure wish there was further uptake of open codecs in the amateur radio world :(

Codec source code by TypoNAM · 2017-01-13 12:19 · Score: 3, Informative

Here's a link to the current source code, as it wasn't straight forward to find: https://svn.code.sf.net/p/free...

Licensed under GNU LGPL v2.1.

--
This space is not for rent.

Re:Codec source code by jensend · 2017-01-13 16:31 · Score: 3, Informative

The github mirror has a nicer interface.

"Barely understandable" by Anonymous Coward · 2017-01-13 12:31 · Score: 0

would be a better description, instead of "clear".

17 U.S. Intelligence Agencies by Rick+Schumann · 2017-01-13 12:39 · Score: 3, Interesting

That's who'll be interested in technology like this. They could compress and store the conversations of every person in the U.S., 24/7/365, for decades, without having to upgrade their data storage capacity.

Just to show I'm not all gloom-and-doom: I'd think NASA, and private spaceflight companies like SpaceX, would be interested, since a low datarate for voice communications would be great, I'd think, for interplanetary distances. With higher datarates available you could have multiple conversations happening simultaneously.

Re:17 U.S. Intelligence Agencies by wonkey_monkey · 2017-01-13 12:55 · Score: 1

since a low datarate for voice communications would be great, I'd think, for interplanetary distances
If you're looking at waiting minutes for any reply, you might as well just use text. If you're on another planet, and incapacitated in such a way that you can't type, and you need help from home, you're probably pretty much boned already.
I certainly wouldn't want to rely on this codec to get any emergency information across clearly.

--
systemd is Roko's Basilisk.
Re:17 U.S. Intelligence Agencies by Bruce+Perens · 2017-01-13 13:01 · Score: 1

There are commercial codecs that get to slightly lower data rates, which the government presently uses.
I once had to ask the Pakistani military to not use the mailing list to ask questions, as I didn't want our ham radio project to get in ITAR trouble. Of course they can still use the code, it's Open Source. But they have to get help elsewhere.

--
Bruce Perens.
Re:17 U.S. Intelligence Agencies by ajb44 · 2017-01-14 05:46 · Score: 1

Codecs designed for conversation are limited in how much they can compress because they can't use as much correlation over a long period - to avoid long latencies. The Intelligence agencies have probably designed their own compression algorithm focussed on offline storage. My guess at the reasons that low-bitrate codecs are export controlled are 1) submarines and 2) covert channels.

Clear? No by wonkey_monkey · 2017-01-13 12:58 · Score: 2

Those samples are anything but "clear." It's still impressive, given the compression ratio, but there's no need to go overboard. You wouldn't want to have to rely on your understanding of one of these samples

--
systemd is Roko's Basilisk.

Thanks by raymorris · 2017-01-13 13:23 · Score: 1

Thanks for that. Sounds like I have a lot of work to do to become nerd famous. ;)

I just checked out your blog and found the bit about switching power supplies interesting. I knew about switching *regulators*, but didn't realize common power supplies could actually run on DC. I'll have to check your blog more often.

Codec 2 700C and Google's RAISR by Anonymous Coward · 2017-01-13 14:51 · Score: 1

I wonder if Google could pair Codec 2 700c and RAISR (Rapid and Accurate Super Image Resolution) for YouTube videos that use even lower bandwidth than the 144p that exist already. Or, they could use the same technology to reduce the bandwidth necessary to stream 1080p/4k/8k videos and further embarrass the data capping ISPs.

The meth seems on by Anonymous Coward · 2017-01-13 15:41 · Score: 0

With some people, it never starts to, either. Just look at Trump's word salad. He doesn't even make sense when he's awake. Can't imagine he makes sense when sleeptalking.

Pushing ever further into unintelligibility by jensend · 2017-01-13 16:01 · Score: 2

I guess it's impressive to get anything other than straight noise out of less than 1kbps. But I've wondered why Rowe hasn't focused more on quality at more moderate (e.g. 2-3kbps) bitrates rather than continuing to seek ways to trade away some quality for an ever lower bitrate. It's been a couple years since I tried it out and came to that conclusion; this looks like that trend has continued.

I couldn't get my encoded samples to sound nearly as good as the samples posted on the codec2 site. And it seemed like the second-lowest bitrate at the time (1400?) sounded essentially just as good as the highest (3200), which meant it wasn't making effective use of the additional bits. The quality jump between its highest mode and the lowest Opus mode (at 6kbps) was huge . (EVS would be a big jump over that.)

From what I understand, codec2's most prominent competition operates at 2.4kbps and up and sounds noticeably better at those rates than codec2 does.

Another thought by jensend · 2017-01-13 16:35 · Score: 1

The jump in intelligibility and voice quality going from 4kHz narrowband to 6kHz mediumband is big- probably bigger than going from mediumband to 20kHz fullband. The distinguishing features of many consonants are between 3.5 and 6 kHz.

Finding some way to take advantage of information beyond narrowband - even if not trying to encode much of it - could be a distinct advantage for a low bitrate codec over existing competition.

SIGSALY by eis2718bob · 2017-01-13 18:08 · Score: 1

Homer Dudley had a working vocoder pre-WW2, which was used in the encrypted voice system SIGSALY.

From Wiki, this encoded voice into 12 signals, each with 6 levels (call it 2.5 bits) at 25 Hz. That's about 750 bits/s.

Re:SIGSALY by Bruce+Perens · 2017-01-13 19:38 · Score: 1

You are leaving out that it encoded the pitch separately, and a voiced/unvoiced bit.

--
Bruce Perens.

familiarity by Anonymous Coward · 2017-01-13 20:47 · Score: 0

You can only usefully judge each sample the first time you hear it. After that, you're tainted. You can more easily understand the sample because you know what to expect; you've heard it before.

Narrow SSB itself is actually pretty awful. The goal should be to do much better. This includes F-S distinction, high female voices (perhaps in short supply for amateur radio), distinguishing different speakers, being kind to people with old failing ears, and handling unusual languages. (unusual: Cantonese, Vietnamese, the African click languages, and the Pacific Northwest consonant-loaded languages)

languages by Anonymous Coward · 2017-01-13 20:52 · Score: 0

"bits for pitch" is what you need for Cantonese and Vietnamese.

"small enough interval" is more a concern for Spanish. Languages with relatively few distinct sounds tend to have longer words that are spoken at a faster pace.

Weird summary by Ozoner · 2017-01-13 22:37 · Score: 1

What a weird summary:

The new codec isn't "competing with single-sideband modulation".

Normal SSB is unprocessed speech. So the codec is simply competing with natural speech.

The claim that SSB "is used by ham contesters to score the longest-distance communications using HF radio" is just plain wacky. So they use natural speech too talk to each other???

Re:Weird summary by Anonymous Coward · 2017-01-14 08:50 · Score: 0

In case this makes the original post a little clearer: most[*] amateur radio contesters use SSB with a fairly narrow bandwidth (about 2.7kHz), often with some analog compression as well. The sound of the voice is heavily distorted, but being able to understand what the other person is saying is what matters to those guys, not high fidelity. If you're trying to persuade them to switch from (analog) SSB to some kind of digital system, the digital system doesn't need to have great quality: it only needs to be as good as the SSB you're trying to replace.
[*] Many contesters use morse code too --- but Bruce Perens' views on more code being obsolete are well known.

Soon, with VoLTE, calls will be free like email. by Anonymous Coward · 2017-01-13 23:39 · Score: 0

Even if this codec isn't perfect, the progress shows that by end of 2017, nobody in the world should be paying for voice calls anymore.

Jio, a new telecom in India has already made voice calls free for life on its VoLTE Network.

Its the bandwidth, stupid by Anonymous Coward · 2017-01-14 06:55 · Score: 0

Now a 5 G Mobile Network will give you 100 Mb/s, so if a single telephony session requires 700 b/s (in one direction), then you could hook up 140,000 telephones to a single mobile and let them make simultaneous phone calls. So when a 'personal computer' is wasteful (it could serve 1000 users), a personal phone is even more so.

implementations on known platforms? by Anonymous Coward · 2017-01-14 12:37 · Score: 1

This call for an implementation on those ESP8266 and similar modules: ADC and DAC (or PWM if absent) to interface with headset and that codec to send voice over IP sparing most possible bandwidth for other data and/or degraded link conditions.
Also an Arduino or other cheap platform and a couple serial rf modules could be an interesting way to tinker with the protocol and explore applications.

Great for unusual data channels (HDVoice modem for by Anonymous Coward · 2017-01-15 02:30 · Score: 0

Being opensource, Codec2 was an important piece in reserach projects like the one with secure voice over HDVoice (amrwb+). Yes, it has been proven possible to make a modem over HDVoice and transmit data. The target was to have encryoted voice over this channel. As transfer rates were low, the voice was encoded with codec2, afterwars encrypted and sent over the HDVoice modem.

Thanks Rowe, great work with Codec2!

6502 by vektros · 2017-01-15 09:33 · Score: 1

Will this run on a 6502 or more importantly is this what bender uses?

HOA by Anonymous Coward · 2017-01-16 14:38 · Score: 0

The HOA may be in conflict with the law. They can't legally stop you from having an effective antenna. They may require that it not be neon green of course.

Slashdot Mirror

Open Source Codec Encodes Voice Into Only 700 Bits Per Second (rowetel.com)

128 comments