Mozilla's New Open Source Voice-Recognition Project Wants Your Voice (mashable.com)
An anonymous reader quotes Mashable:
Mozilla is building a massive repository of voice recordings for the voice apps of the future -- and it wants you to add yours to the collection. The organization behind the Firefox browser is launching Common Voice, a project to crowdsource audio samples from the public. The goal is to collect about 10,000 hours of audio in various accents and make it publicly available for everyone... Mozilla hopes to hand over the public dataset to independent developers so they can harness the crowdsourced audio to build the next generation of voice-powered apps and speech-to-text programs... You can also help train the speech-to-text capabilities by validating the recordings already submitted to the project. Just listen to a short clip, and report back if text on the screen matches what you heard... Mozilla says it aims is to expand the tech beyond just a standard voice recognition experience, including multiple accents, demographics and eventually languages for more accessible programs.
Past open source voice-recognition projects have included Sphinx 4 and VoxForge, but unfortunately most of today's systems are still "locked up behind proprietary code at various companies, such as Amazon, Apple, and Microsoft."
Thanks to Nuance voice recognition industry is effectively dead. If Mozilla can make this work in offline mode it would be awesome. Not requiring your every word to be recorded shipped off to third parties would be very useful.
Sounds good if they make the corpus freely available. Having lots of free high quality audio ...
I agree, but from a quick look at their page, I see a lot of problems with reaching that goal.
1: Most computers I've seen have pretty wretched audio inputs: tiny microphones near the screen, so not anywhere near the speaker's mouth. So we can expect lots of noise, echo, and other stuff. Good for simulating the real world (because it basically is the real world), but not what I would call high quality. Some gamers and others probably use good quality headsets, but I doubt they will make up the majority of the data base. Audio might be pretty good if the speakers use cell phones.
2: People reading written text don't talk the same way as in natural conversation. That's going to be a limitation for some developers.
3: They seem to be depending on the generosity/curiosity of people to generate and validate the samples. That's a hard way to get thousands to enroll. If they had some kind of game or other system that provides a psychic reward/incentive to the users I'd be more confident of a good response.
And a final comment: I hope they're sampling at 16 kHz instead of 8. To explain: Nyquist's Theorem says the sampling rate needs to be more than twice the highest frequency component in the analog signal. Speech typically contains components up to about 6 or 7 kHz, so 16k is a good number. Unfortunately, the carbon microphones that phones used for the first 100 years or so only go up to about 4kHz, so Ma Bell (remember her?) settled on an 8kHz rate in the middle of last century, and most everybody else has accepted that ever since.
I don't think he's a troll, there's a point to be extracted from that.
I love Mozilla because of how much they've done for the web, from fighting for standardisation, HTML5, JavaScript, and building up one of the most complex applications around, to fighting a little for users' privacy, etc, but they deserve all the abuse they get for getting rid of the most natural leader (creator of JavaScript, no less, from the early days of Netscape) - and yes, it well and truly was a witch-hunt against him.
Without him as a leader, these days, it appears to most that Mozilla is just following in the wake Google's Chrome and copying everything from the outward design, to the extensions/addons system, etc.