Open Standard For Recording Compressed Voice?
john napiorkowski asks: "I do a lot of voice recording and have been using realaudio, which distributes 'free beer' tools for the job. However, I am greatly concerned by this reliance on such a proprietary tool; is there an open standard, free software replacement available? I have tried MP3, but it doesn't sound remotely as good as realaudio for voice recording at very high compression levels. Fifteen minutes of voice compressed with realaudio is under a meg and sounds almost exactly like the original, while MP3 sounds very poor to get the file that size."
it was expressly designed for just this task, in fact; some kind geeks have been extending the format so it's seekable under WinAmp and XMMS. Its primary use these days is lossless audio compression, but it _does_ support a large variety of lossy modes, including some expressly designed for speech.
Go here for source/rpms/debs...
Yeah. For one thing, all of the information in speech is encoded within a 4 kHz bandwidth, while music is a 22 kHz bandwidth. In addition, there are many differences between the nature of voice signals and music signals.
If you're ONLY dealing with speech, it's much easier to get a good compression ratio than if you have to deal with music, and I'm not just talking about the bandwidth differences.
I wouldn't bother with Vorbis - it was designed for music, so it won't work for voice signals as well as codecs designed for voice.
I would look at codecs like the aforementioned GSM or G.whatever (G.711 is one speech codec, can't remember the others. I'd go to http://www.openh323.org/ for some more information on speech codecs among other things.
Note that G.whatever (and I think GSM) too, are at least somewhat encumbered by patents, but the licensing terms are relatively friendly from what I gather. And they are most definately standardized. (The only speech codec in wide use that I can think of off the top of my head is Qualcomm's PureVoice codec, used quite heavily in CDMA cell phones.)
retrorocket.o not found, launch anyway?
Other than MP3 or RA, GSM has been designed primarily for compressing speech (afaik it's optimized for a German speaking male voice, ymmv). I record radio programs via cron job on a daily basis, and at a sample frequency of 11025 Hz (8 MB/hour) - as opposed to the standard 8 kHz - there is practically no audible difference between the (mono) FM broadcast and the GSM replay. Moreover the compression doesn't use too much CPU so even some old 486 should do it in realtime.
If you looking for a lightweight command-line tool for GSM compression, check out the GSM Tools from my homepage (thx to Jutta Degener and Carsten Bormann for their GSM library).
If you want something that is easy to implement, try a continuously variable slope delta modulation (CVSDM) encoder/decoder. You can get communications quality voice at 32 kilobit/second. Not as good as the more sophisticated systems used in PCS and secure telephones, but very easy to implement and it doesn't need a fast CPU. It is used on the Space Shuttle's air-to-ground communication links.
Mea navis aericumbens anguillis abundat
Check out LAME (www.sulaco.org/lame). it does mp3 encoding, but has special otions for encoding voice (bandpass filters, single channel, low bitrate, etc.). It is under the GPL and is finally patent free! It also compiles under (nearly) every computer system known to man.
Quando Omni Flunkus Moritati
You might want to check out Ogg Vorbis
I'd definately recommend this, it can be played on the open source, player Freeamp - which runs on Solaris, Linux, BSD, and Windows.
Steve
---
Fifteen minutes of voice compressed with realaudio is under a meg and sounds almost exactly like the original, while MP3 sounds very poor to get the file that size.
If you want poor quality, you could use a phoneme-based compression and compress about 15 hours in about a meg, assuming 2-byte phonemes, 10 phonemes per second. That would not be "voice compression" but "speech compression", though.
While I was out looking for a GSM source, I came across this page which has a table of some of the different options, better than I could have put it. They also have sound bites in each format, however, they are in the compressed format so you'll need a decoder for each format to listen.
You may want to check out Ogg Vorbis, which is an alternative patent-free opensource audio compression. I haven't heard any low bitrate samples and the implementation is rather new, so I really can't vouch for this.
The size/quality trade-off for MP3's varies widely with different encoders. I use a CD-ripper program called "CD Copy". By default this ripper uses the "Blade" encoder, which is free but sounds pretty bad unless the bit rate is as high as 128 kbps. If you plug in a different encoder, such as the "Lame" encoder, you can get much higher quality sound with lower bit rates. After I got the LAME_ENC.DLL and plugged it into CD Copy, I started encoding music at 64 kbps!! And it still sounds fairly good (subjective, sure). If you would like to give this encoder a try, and you need help setting up CD Copy for this type of WAV-to-MP3 conversion, drop me a line.
// Alan Porter
CELP certainly compress voice nicely but it requires a lot of processing power. There is A(rithmetic)CELP which is developed by professors and research folks at the university of sherbrooke where I study. From what I understood in a speech from one guy working on that, it's like CELP on steroid, it requires much less power. So today when people talk about CELP, they mean ACELP. That's what cell phones uses. It's not open dude cuz a lot of work has been put on that. It can't be free. It has been exclusively licensed to Siprolab which let them work exclusively on tech stuff. The university sure gets some royalties off that work. Like the professor said, they just can't give away all that effort, it took them years to get where they are now. So you must not be surprised if these high-tech compression algorithms are not public... Otherwise, how would they make money from them ?
I would also get yourself to a good University library and find articles and books on voice coding. About a year ago I saw some articles on very low rate bit coding for voice. Again it was government/military sponsored work. I remember reading about an experimental coding system that operated in the 40bit/sec range (yes I find that hard to believe too). It should not be hard to find some good material.
There is a small book consisting of nothing but reprints of some of the most important papers in the field of Speech Analysis/Synthesis/Coding. Unfortunatly I don't have the title, but it does have a copy of "Speech Analysis and Synthesis by Linear Prediction of the Speech Wave" by B.S Atal and Suzanne L. Hanauer (Journal of the Acoustical Society of America, volume 50 Number 2) which describes the LPC process.