Mozilla Releases Open Source Speech Recognition Model, Massive Voice Dataset (mozilla.org)
Mozilla's VP of Technology Strategy, Sean White, writes:
I'm excited to announce the initial release of Mozilla's open source speech recognition model that has an accuracy approaching what humans can perceive when listening to the same recordings... There are only a few commercial quality speech recognition services available, dominated by a small number of large companies. This reduces user choice and available features for startups, researchers or even larger companies that want to speech-enable their products and services. This is why we started DeepSpeech as an open source project.
Together with a community of likeminded developers, companies and researchers, we have applied sophisticated machine learning techniques and a variety of innovations to build a speech-to-text engine that has a word error rate of just 6.5% on LibriSpeech's test-clean dataset. vIn our initial release today, we have included pre-built packages for Python, NodeJS and a command-line binary that developers can use right away to experiment with speech recognition.
The announcement also touts the release of nearly 400,000 recordings -- downloadable by anyone -- as the first offering from Project Common Voice, "the world's second largest publicly available voice dataset." It launched in July "to make it easy for people to donate their voices to a publicly available database, and in doing so build a voice dataset that everyone can use to train new voice-enabled applications." And while they've started with English-language recordings, "we are working hard to ensure that Common Voice will support voice donations in multiple languages beginning in the first half of 2018."
"We at Mozilla believe technology should be open and accessible to all, and that includes voice... As the web expands beyond the 2D page, into the myriad ways where we connect to the Internet through new means like VR, AR, Speech, and languages, we'll continue our mission to ensure the Internet is a global public resource, open and accessible to all."
Together with a community of likeminded developers, companies and researchers, we have applied sophisticated machine learning techniques and a variety of innovations to build a speech-to-text engine that has a word error rate of just 6.5% on LibriSpeech's test-clean dataset. vIn our initial release today, we have included pre-built packages for Python, NodeJS and a command-line binary that developers can use right away to experiment with speech recognition.
The announcement also touts the release of nearly 400,000 recordings -- downloadable by anyone -- as the first offering from Project Common Voice, "the world's second largest publicly available voice dataset." It launched in July "to make it easy for people to donate their voices to a publicly available database, and in doing so build a voice dataset that everyone can use to train new voice-enabled applications." And while they've started with English-language recordings, "we are working hard to ensure that Common Voice will support voice donations in multiple languages beginning in the first half of 2018."
"We at Mozilla believe technology should be open and accessible to all, and that includes voice... As the web expands beyond the 2D page, into the myriad ways where we connect to the Internet through new means like VR, AR, Speech, and languages, we'll continue our mission to ensure the Internet is a global public resource, open and accessible to all."
I guess with Firefox OS, Thunderbird, etc now dead and buried Mozilla needs something else to do instead of working on Firefox. I mean with all the time they've saved by transforming Firefox into Chrome those 1200 people need something to say that they're working on.
If - and I don't yet know if this is the case, they don't actually seem to say - this represents a stand-alone, does-not-go-to-the-LAN-or-WAN speech-to-text system... with an error rate of 6.5% on English speech as claimed... then it's way more important than Yet Another Web Browser.
This is precisely the kind of thing projects like Mycroft need to become not just another way to send your activity out on the net, which inherently decreases both reliability and security.
If indeed this is what this is, then the door opens for all manner of sophisticated home advances we can actually trust and depend on.
They claim around 1:1 [decode rate : normal speech rate] with a reasonably modern CPU/GPU. That needs considerable improvement. Reference quote from here:
That's a lot of computing power to hand off, particularly in a laptop. Using just the CPU, you'll be pegging it the whole time you're talking, and then some. For a decent desktop, it's at least doable, but it's still a very heavy compute load.
Though... saying "MacBook Pro" doesn't really tell us enough... I have a MacBook Pro that is a dual-core Intel machine... it's not what you'd call quick. There are a lot of different hardware configs that could be described by "MacBook Pro."
Seems like a pretty big deal to have to dedicate a server to the STT task (but then again, if I could get my STT tasks out from under the cloud... I'd probably do it. I have a spare 3 GHz 8-core hanging around, so...) but I think for general use, they have to do better. This isn't going to fly well on a Raspberry pi, for instance, it'll just get way behind.
Still. IMHO, this may be important. Very.
I've fallen off your lawn, and I can't get up.
This is a HUGE issue: Firefox continually increases the CPU power and memory it uses, even when you aren't looking at a Firefox window. Why? What is Firefox doing? Bitcoin mining?
Why does Firefox use so much memory when there are only a few tabs open? Why does Firefox increase memory use when it is not being viewed?
> Why so many OK OK's to install an add-on?
Because i want to install add-ons and not let random sites, apps or other add-ons to be able to install add-on silently, just like the old activex in IE
> Why break old good ones?
Because old ones could touch and replace ANYTHING in the browser, so it was a huge security problem, performance problem and locked mozilla from making big changes, as it would break many extensions. They finally decided to break everything and define a proper add-on API, that can be stable, run in outside and locked processes and using multiple cpus. They didn't decide to break the add-on just to annoy you, they had very good reasons
> Why uncheck 5 boxes to get a blank new tab?
I do like the new start page... but if you do not, then the 5 boxes to disable all the start page features is not hard at all, you just need to do it once. Notice that all that info in local info, what you see It's flexible enough to please most people... and those that really want a empty page, it's there too. There is no default config that will make everyone happy
Higuita