Domain: speex.org
Stories and comments across the archive that link to speex.org.
Comments · 59
-
Re:Data Set Publicly Available?
Now, training is a little trickier because I cannot share the data.
I cannot share the current data I'm using because it's copyrighted. Hence asking for people for help getting data that I can redistribute.
So weâ(TM)re supposed to just give jmv a bunch of data with no way to know how he is using it?
Yes, because I have such a track record for keeping things private.
-
Re:Siri Is Not A Bandwidth Hog; 63KB/Query
Do they use Speex?
-
Re:Speex?
From http://www.speex.org/docs/manual/speex-manual/node4.html "Every speech codec introduces a delay in the transmission. For Speex, this delay is equal to the frame size, plus some amount of ``look-ahead'' required to process each frame. In narrowband operation (8 kHz), the delay is 30 ms, while for wideband (16 kHz), the delay is 34 ms. These values don't account for the CPU time it takes to encode or decode the frames." So speex has a higher latency.
-
Re:Dirac
I think a big reason is because the Xiph project has a few other codecs developed in-house that are successful. Besides Vorbis, their MP3 alternative, Speex and and FLAC are "under the Xiph.org banner". This allows them to promote Theora more. Also, Dirac was released in 2008 vs Theora's 2004, so Theora has had 4 more years to get a following.
-
Not really...
First, the paper was testing the Speex codec, and in based in principle on looking at codecs which use variable bit-rate CELP, a compression scheme which is tailored to speech, not music (music sounds terrible through one of these codecs, because their dictionaries are filled with speech sounds). Having music in the background is only likely to confuse the codec, making the speech sound terrible too, possibly to the point of unintelligibility.
The conclusions do not apply to more standardized codecs like G.711 and G.729a, which use fixed size packets.
The paper itself can be downloaded from here. Get it quick, before the IEEE figures this out and make the author remove it so they can extort their fee. -
Re:Ya, it is
Speex is pretty weak for studio production quality audio. At least in my experience, it's most fantastic for, e.g., VoIP applications and the like. It's tremendous at reproducing identifiable and intelligible speech at extremely low bitrates (like 4 - 15 kbps, or their wideband that can go, I think to 32). See http://www.speex.org/samples/.
The fidelity is remarkable when your standards of comparison come from telephony. However, using Speex for studio-quality apps isn't ideal. You'd be better off, as grandparent pointed out, at a low-bitrate, mono, 22 - 33 kHz Vorbis file, which goes easier on the high- and low-pass filters. -
Re:Storage vs processing vs quality
Many voice mail systems only use 32kbps sampling and achieve fine results for that purpose, and the algorithms are easy enough to render on a 8-bit micro costing 50c.
I'm not sure exactly how lightweight the algorithm is, but Speex would be more appropriate for that than a general-purpose audio codec, and has the same "no license fees" advantage as Vorbis. I wonder how Speex is doing in "the industry?"
-
16 kHz
That's called wideband speech. It's been around for 10+ years and Speex supported it about 4 years ago. About time people actually use it (i.e. why people are still using narrowband in VoIP is beyond me).
-
Re:Marketing BS
FYI: 20ms of 16khz audio (the typical size of 1 RTP packet) encoded with the Speex Codec http://www.speex.org/ is 43 bytes. 20ms of 8khz audio encoded with the Speex Codec http://www.speex.org/ is 29 bytes which is only 1.4 times as big as it's 8khz counterpart. 20ms of 8khz g711 is 160 bytes so with speex at 16khz, you can still fit 3 calls in the same amount of bandwidth that it takes for one 8khz call. The biggest overhead in VoIP is the various headers on each RTP packet per level of encapsulation, not the size of the payload.
-
Re:Marketing BS
FYI: 20ms of 16khz audio (the typical size of 1 RTP packet) encoded with the Speex Codec http://www.speex.org/ is 43 bytes. 20ms of 8khz audio encoded with the Speex Codec http://www.speex.org/ is 29 bytes which is only 1.4 times as big as it's 8khz counterpart. 20ms of 8khz g711 is 160 bytes so with speex at 16khz, you can still fit 3 calls in the same amount of bandwidth that it takes for one 8khz call. The biggest overhead in VoIP is the various headers on each RTP packet per level of encapsulation, not the size of the payload.
-
Re:Solution?
We can't blame Average Joe. If they (and their ISPs!) didn't hear about Jabber, it is somehow Jabber's fault.
I really wonder if all ISPs know about Jabber.
I wonder why Jabber community doesn't work with Tipic corp. (http://www.tipic.com ) to implement video/audio chat to jabber? Because they are closed source? Well, their server and client are closed source but they are using open standards.
Look what it says:
Tipic Voip/audio implementation is based on the following Open Source projects:
- http://iaxclient.sf.net/ Basic VOIP stack. Tipic added video, wideband audio and support for echo cancellation.
- http://www.speex.org/ Default audio Codec. Tipic sponsored the echo cancellation improvements.
- http://www.theora.org/ Default Video Codec.
- http://www.libsdl.org/ Video visualizzation in TipicIM.
So, they managed to make a open standards based video chat. Problem is, the geek community sees videochat as "lame". Well, average people LOVE it.
I wonder how many people congratulated them for implementing such a thing on Jabber?
I bet that Average Joe would use Jabber if it performs much better on video chat. That is the "geek vs average user" thing hurting open standards as usual.
Who used Mozilla while it was a total geek thing? How many average, non techie people use Firefox because it performs better and promises more security than IE?
Remember people blamed average Joe not using Mozilla giant instead of IE. Who's fault was that than?
Look to another example. Gizmo project is completely open source, not coded by guys who coded Kazaa and completely open standards based. It has many non techie, non geek users. Do you think they are impressed by GPL, RMS and open standards? No, Gizmo sounds better than Skype, that is all :) -
Re:Attention hardware manufacturers
http://www.speex.org/
It's like Ogg for speech instead of music. I use it for audiobooks and such, but I can't take it with me yet. -
Re:Man in the middle
You mean speex
:-p Vorbis is better tuned for music than speech. -
Audio compression without Fourier transforms
The Swedish mathematician who proved a convergence theorem for Fourier series. without him there would be no IPOD.
:pWithout Fourier transforms, we would have used time-domain methods for processing digital audio. Shorten, FLAC, Apple Lossless, and most other lossless audio codecs make use of an autoregressive analysis of a block of audio, followed by linear prediction with entropy coding of the residuals. The GSM Full Rate codec (implemented in Toast) and the Speex codec operate in much the same way, except they add pitch analysis (to filter out the periodicity of vowels and instrumental chords) and lossy quantization.
-
Bad format, though
MP3 is not really optimal for speech. E.g. speex would provide much better quality/bitrate ratio. Sadly, speex is not very well supported. (I would love to have a digital portable recorder with a built-in speex codec.)
-
Really for the first time?
I'm sure I remember a LUGRadio interview with someone from Xiph who said that DirectX (or was it Xbox Live, or both?) uses the Speex codec to compress voice data for in-game chat.
Oh, here we go: Halo 2 and Xbox Live use Ogg codecs. -
Re:Whats the point?
-
SIP simply isn't up to the taskOne reason Skype has been such a success is that they didn't try to use SIP, which while an open standard, is poorly equipped to deal with NATs and firewalls. There is no point in using an open protocol if it isn't well suited to the job, and from what I have seen, SIP isn't. To date, Skype is the only VOIP app that I have found to handle NATs and firewalls reliably.
It shouldn't be hard for someone to combine an open source voice codec like Speex with UDP NAT circumvention (which isn't hard to implement), and come up with an open source alternative to Skype. I am actually amazed that nobody has done this yet.
-
Better than MP3
Now Star Fox 64... with that much voice acting, how did it manage to stay under a Gig? Nintendo has some of the best compression methods around. If I remember correctly, the voice acting from SF64 was done using an algorithm similar to MP3.
Mobile phones and other voice-tuned audio communication systems usually encode the voice with CELP (code excited linear prediction) rather than transform codecs such as MP3. Go to the Speex site to hear samples of how good voiceovers can sound even after lossy compression to 10 kbit/s.
-
Re:Again?
Rather than moving to mp3 at the end, you might look into speex. I used this for putting church sermons on the net and got a good size/quality trade off.
I'm suprized for lectures, speex isn't used more often.
http://www.speex.org
(they're part of that whole xiph/vorbis bit)
The folk over here make it work under windows/media-player as well! -
Re:Ogg Vorbis is better than MP3 in many ways.
The algorithms used to make and decode MP3s are patented by Fraunhofer Gesellschaft (licenses are paid through Thomson). Thus, in countries which observe software patents (such as the US), any implementation of those algorithms cannot be legally distributed without paying a patent license fee. Fraunhofer and Thomson claim that the relevant patents apply in many countries besides the US (warning: this page lists patents you might not wish to become familiar with). The patent holder determines what the fee is and they can change the fee at any time or refuse to issue a license to a particular would-be licensee. Most patent holding corporations tie the license fee to the number of copies of programs distributed (which means such payment schemes are incompatible with free software).
mp3licensing.com, the site which lists the license schedule, lists a one-time payment for the MP3 decoder (between US$50,000 and US$60,000), but as far as I know, nobody has paid that fee. The encoder has no one-time fee, and thus cannot be legally distributed as free software in countries where software patents exist.
I suspect that in some years when these patents have expired, there will be a lot of GNU/Linux distributions picking up support to make and play MP3 files. Ogg Vorbis will still be a better option on technical grounds, however. If you're encoding human spoken voice, consider Speex with or without the Ogg container. I'm very impressed with what it can do in such a small file.
-
Re:xmms
Ogg is free, (supported by xmms), patentless and offers better compression (or what ever you call it) than mp3.
Oh, yes, and hundreds of portable devices support it, also. Not to mention the huge existing filebase, right?
BTW, I think you mean Ogg Vorbis. Ogg is a file format, and within it, just for audio, there's Vorbis, Speex, and FLAC support, etc. Ogg also does video, using Theora, among others. Vorbis is likely the most popular audio codec using Ogg. However, Vorbis is lossy, so it makes no sense to convert MP3s over through yet another stage of lossy compression just because it's spiffy. And for people with gigabytes of recorded music, some of it live, re-ripping or re-recording with Vorbis as the only codec not only may not be practical, it may not even be possible, sometimes. -
Re:I'll tell you what's heroic
I wouldn't store it as Vorbis, try Speex instead, it's better for speech. First converted the
.wav to 16000Hz sample rate, then speexed at default settings, and it was a nice 1.9meg file... and the quality was still pretty amazing. -
Re:Low bit rates works well with speech.
I think while these low bit rate transmissions might not be great for music, they do work pretty well for transmission of mostly speech broadcasts such as news, radio talk shows and sporting events.
Ogg Speex is intended for such use (and a reason why people talking about the quality of "Ogg" are talking nonsense).
-
Re:Progress
Why? As I recall, MP3, ogg vorbis, and the like aren't meant for compressing voice data. They're much better at dealing with music.
There are codecs specifically meant for speech, such as http://www.speex.org/. -
Winamp IS dead ...
for me. Once I tried foobar2000 there was no going back.
Features
* Open component architecture allowing third-party developers to extend functionality of the player
* Audio formats supported "out-of-the-box": WAV, AIFF, VOC, AU, SND, Ogg Vorbis, MPC, MP2, MP3, MPEG-4 AAC
* Audio formats supported through official addons: FLAC, OggFLAC, Monkey's Audio, WavPack, Speex, CDDA, TFMX, SPC, various MOD types; extraction on-the-fly from RAR, 7-ZIP & ZIP archives
* Full Unicode support on Windows NT
* ReplayGain support
* Low memory footprint, efficient handling of really large playlists
* Advanced file info processing capabilities (generic file info box and masstagger)
* Highly customizable playlist display
* Customizable keyboard shortcuts
* Most of standard components are opensourced under BSD license (source included with the SDK)
If you've ever tried writing a plugin for Winamp you'll fall in love with the fb2k SDK, its like heaven compared to the other player. ;-) -
Or "Ogg Spics"
We need to come up with a new, OSS, audio standard. Then name said standard ".jizz".
Or we could use the existing name ".spx" (which officially stands for Ogg Speex, a talk-radio codec) and make anti-consumer advocates, such as the RIAA, its member labels, and its bought-and-paid-for senators, appear racist:
"Spics will destroy the hard work thousands of people. If we allow spics to spread, thousands of jobs will be lost. Not to mention the kids, what will all these spics everywhere do to the kids?"
-
Re:Secure VoIP
lame or oggenc?
I think that's for music. Maybe you meant speex or some other such voice codec. -
Cool, not scary -- Hitachi 9980 in my future!This isn't scary, it's the coolest thing imaginable.
I've spent a chunk of time lately playing with a Sun/Hitachi 9980. Imagine a fiber channel array of hard drives the size of a nice, hefty subzero 2-door refrigerator (2m x 2m x 1m, roughly, for 1 control module and 1 array module).
It hooks up to a dozen computers, has room for over 100TB of drivespace (raid-5), has an configuration console beyond the OS that allows some slick on-the-fly tricks, is compatible with virtually ANY OS, lets you slice the array a zillion ways, gives you a data pipe of Gigs per second, and costs a million dollars. Now that's some serious power: you could capture the entire speex-quality audio of 400 people's entire 80-year lives on it (400 x 80 x 365.25 x 24 x 3600 x 1k/s = 1.0098 x 10^15 bytes, or 100TB).
But... one day I was trying to find words for how cool this thing is, and I realized: I can remember paying a buck a byte for memory, and wincing at HD prices. I also still have a ST225, for nostalgia or whatever reason. And a 250gig drive is down around $100 now, so I'm just 2^9 away from 100TB. A conservative pseudo-Moore's law rate for HD's gets me there in 20 years: my ST225 (20Meg) is about 20 yrs. old, or 2^13 in 20 yrs).
Given the exponential rate of storage growth, I am less than 20 years from being able to buy one of these puppies at commodity prices. And by 2030 it'll fit on my wrist.
EX-cellent...
-
We need an Open Source SkypeI am amazed that nobody has built an open source VoIP application, perhaps around the Speex codec, which employs simple UDP NAT circumvention to get around the nasty configuration issues which plague most VoIP applications.
Until someone does, Skype, a proprietary closed protocol, but the only "zero configuration" VoIP application I know of, is likely to continue to acquire users.
-
Go Rush!
I have a nice 1.5Mbit connection.
Which can feed 7 listeners at 192 kbps or 46 listeners at 32 kbps, as you seem to recognize with talk radio. Wideband Speex, an audio codec designed for talk radio and telephony, sounds listenable even at 12 kbps (listen). However, more listeners for talk radio does mean a bigger audience for conservative spokesmen, whether you agree with them or not.
32kb/s music just doesn't cut it.
Have you actually tried listening to a recent codec at 32 kbps? Sure, it's not transparent, but often it'll do in a rather noisy environment such as while riding your bike or the city bus. If you want to try, grab a few 32 kbps Ogg Vorbis files from this page.
-
Re:More interested in 32kbps speech
Speex.
Sound great on 8 and 16 khz material, even up to 22 khz, but sounds terrible in its "super wideband" mode. Another recommendation is Ogg Vorbis, though so far it's ending up near the bottom of the quality scale for my ears in this 32 kbps test. -
Re:If it's not Ogg....
Are you talking about Ogg Vorbis or Ogg FLAC or Ogg Speex? Speex is better for that kind of task.
"Speex is an Open Source/Free Software patent-free audio compression format designed for speech" and speex is part of the xiph foundation :-) -
Re:who cares about MPEG anymore?
decent free beer formats?
Ogg is more than decent, it's the best lossy audio format out there and it's free as in free speech and beer.
How come u don't even know about theora? u suck! are you a windows XP user or what? Why should we care about MPEG that steals our money when we can use free ogg instead? do u work for them or are you just stupid? (guess u have no idea what FLAC and speex are either) -
Re:what is Ogg Vorbis?
What I don't get is why they didn't choose Ogg Speex, a codec that is similarly Free, but aimed especially at voice recordings.
-
Re:Off the top of my head..Given compression rate possible with voice, a 1 minute recording is a bit under 1 MB.
Thats what I get with my mp3s and OGG files! I have a good quality void recording of a comedian. I've stored it on my hard drive using Speex, which is an OSS codec that's designed for speech. It takes up less than 346KB per minute of recording. This figure could be pushed even lower if you were recording from a telephone as sound quality wouldn't matter so much as it will already have been heavily compressed.
-
Completely agreeI can't believe that in this day and age the authors of these VoIP applications don't seem to realize that the vast majority of Internet users are behind NATs or firewalls. Protocols like SIP and H323 simply aren't equipped to deal with this effectively. The result? A closed protocol like Skype is rapidly becomming the global VoIP standard.
Zero-configuration NAT circumvention is much easier than people think. You just get both NATed peers which want to send UDP packets to each-other to send a few packets to the other's NATs on the ports you want to use. Most NATs will then start to forward those UDP packets and hey presto! You have established a direct UDP link between the two peers and your user hasn't had to lift a finger.
All someone has to do is to combine this technique with somethink like Speex, make sure you have both Linux and Windows versions, and we have a free competitor to Skype using an open protocol. I would do it myself if I had the time.
-
It's worth _listening_ to.
Having listened to the speech, I assure 'yall it's much better listened to than read.
I've put together a BitTorrent share with a Speex encoding of his speech. Please be gentle. -
Try voice chat
For those interested, UT2004 is using an open-source codec for voice communication between players. It's also probably the first game to support wideband (16 kHz) communication.
-
Re:Why not just use MP3?
-
Re:KDE most impressive open source project - ever
Hey Mr. ass-backward. Stop the gnome/KDE flamewars and take my comment for what it is: commenting on the philosophical difference between KDE and gnome. Saying Qt is a new or an existing toolkit is just a matter of when you look at it. The main idea remains that the idea of gnome was to to reuse as much stuff as possible (even when it shouldn't have), while KDE wrote much of these "from scratch" and has its stuff "more integrated" (just think about window managers).
If people on slashdot want to be taken seriously they really ought to make use of the freedom they are given and actually use some of the source code we donate.
Maybe you want to have a look at the stuff I donated:
Speex
FlowDesigner (previously Overflow
GLPlot -
Shameless plug
Of course I'm biased, but I hope the use an open-source codec.
-
Mod parent up: The creator of Speex
The parent poster is the creater of Speex, which is a kick-ass audio compression format designed for speech. See here: Speex
-
Digital Recording, Analog Transfer?
Clearly this is not the ideal solution, but perhaps it strikes a balance between your needs and ideals.
Consider picking the device based on its stand-alone features, then upload the recording via the line-in on your sound card.
Of course this won't work if time is an issue, but maybe it would be workable for you to just hit play and go to bed.
Anyway, once you do that you can use a sensible, open codec like Speex.
-Peter -
Speex
For monophonic human voice encoding, Speex at 20 kbps is transparent over my stereo system. Have a listen.
-
Re:Flawed samples
As you can read in the pre- and post-test discussions at ff123's 64kbps listening test, which used a similiar set of samples).
There were no speech samples included, because the tests focused on music (which is what most people will use AAC for) rather than on speech. However Rjamorim (guy behind the Hydrogenaudio.org tests) said he might do a speech codec (including Xiph's Speex and others) test, after the cross-format test is done. -
Re:Archive it!
Can anyone provide a link to a Speex encoded copy (or
.ogg, .mp3) for those of us who won't touch RealPlayer? -
TeamSpeak
-
TeamSpeak - http://www.teamspeak.org/
I like TeamSpeak 2. Only the server's UDP port 8767 must be accessible, and the latest release candidate achieves excellent results with the open codec Speex. Both client and server are available for Windows and Linux.
-
The server doesn't resume.
This means many dial-up users can't get a complete file. It would be a very useful feature to add.
I agree with motown that Ogg Vorbis and Speex are worth a look. Ogg Vorbis is good at 48k mono, but is surprisingly bad at 32k.