Slashdot Mirror


Mozilla's New Open Source Voice-Recognition Project Wants Your Voice (mashable.com)

An anonymous reader quotes Mashable: Mozilla is building a massive repository of voice recordings for the voice apps of the future -- and it wants you to add yours to the collection. The organization behind the Firefox browser is launching Common Voice, a project to crowdsource audio samples from the public. The goal is to collect about 10,000 hours of audio in various accents and make it publicly available for everyone... Mozilla hopes to hand over the public dataset to independent developers so they can harness the crowdsourced audio to build the next generation of voice-powered apps and speech-to-text programs... You can also help train the speech-to-text capabilities by validating the recordings already submitted to the project. Just listen to a short clip, and report back if text on the screen matches what you heard... Mozilla says it aims is to expand the tech beyond just a standard voice recognition experience, including multiple accents, demographics and eventually languages for more accessible programs. Past open source voice-recognition projects have included Sphinx 4 and VoxForge, but unfortunately most of today's systems are still "locked up behind proprietary code at various companies, such as Amazon, Apple, and Microsoft."

55 comments

  1. Mozilla = SJW shitfucks by Anonymous Coward · · Score: 0, Troll

    Fuck Mozilla until they apologize for the SJW-fueled witchhunt against Eich.

    1. Re:Mozilla = SJW shitfucks by yuvcifjt · · Score: 2

      I don't think he's a troll, there's a point to be extracted from that.

      I love Mozilla because of how much they've done for the web, from fighting for standardisation, HTML5, JavaScript, and building up one of the most complex applications around, to fighting a little for users' privacy, etc, but they deserve all the abuse they get for getting rid of the most natural leader (creator of JavaScript, no less, from the early days of Netscape) - and yes, it well and truly was a witch-hunt against him.

      Without him as a leader, these days, it appears to most that Mozilla is just following in the wake Google's Chrome and copying everything from the outward design, to the extensions/addons system, etc.

  2. Unanticipated consequences by Anonymous Coward · · Score: 0

    I'm afraid an Alexa with the voice of that Australian woman who does the Trivago hotel-booking commercials could get me to buy anything.

    The results of A/B testing over thousands of voices seems ominous for my paycheck.

  3. Corpus Quality by lobiusmoop · · Score: 1

    Sounds good if they make the corpus freely available. Having lots of free high quality audio recorded from modern digital microphones would be useful. Voxforge recordings tend to be poor quality, TIMIT is still proprietary despite being over 30 years old now, and the TEDLIUM corpus recordings seem to have a horrible amount of reverb/echo in them.

    --
    "I bless every day that I continue to live, for every day is pure profit."
    1. Re:Corpus Quality by coarticulation · · Score: 2

      Sounds good if they make the corpus freely available. Having lots of free high quality audio ...

      I agree, but from a quick look at their page, I see a lot of problems with reaching that goal.

      1: Most computers I've seen have pretty wretched audio inputs: tiny microphones near the screen, so not anywhere near the speaker's mouth. So we can expect lots of noise, echo, and other stuff. Good for simulating the real world (because it basically is the real world), but not what I would call high quality. Some gamers and others probably use good quality headsets, but I doubt they will make up the majority of the data base. Audio might be pretty good if the speakers use cell phones.

      2: People reading written text don't talk the same way as in natural conversation. That's going to be a limitation for some developers.

      3: They seem to be depending on the generosity/curiosity of people to generate and validate the samples. That's a hard way to get thousands to enroll. If they had some kind of game or other system that provides a psychic reward/incentive to the users I'd be more confident of a good response.

      And a final comment: I hope they're sampling at 16 kHz instead of 8. To explain: Nyquist's Theorem says the sampling rate needs to be more than twice the highest frequency component in the analog signal. Speech typically contains components up to about 6 or 7 kHz, so 16k is a good number. Unfortunately, the carbon microphones that phones used for the first 100 years or so only go up to about 4kHz, so Ma Bell (remember her?) settled on an 8kHz rate in the middle of last century, and most everybody else has accepted that ever since.

    2. Re:Corpus Quality by starless · · Score: 1

      high quality audio recorded from modern digital microphones would be useful. .

      What is a "digital microphone"...?
      Does that term actually mean something?

    3. Re:Corpus Quality by Anonymous Coward · · Score: 0

      Monster makes them. They have GOLD connectors, which never lose any bits.

    4. Re:Corpus Quality by arglebargle_xiv · · Score: 1

      My employer is also building up a massive repository of voice recordings, and we'd also be keen to get everyone's voiceprints on file. If people are interested in contributing, please contact bulkcollectionoffice@nsa.gov.

    5. Re:Corpus Quality by Anonymous Coward · · Score: 0

      I stopped after 15 validations-to some of the bites I replied "nope" just because I couldn't hear them...

    6. Re:Corpus Quality by Anonymous Coward · · Score: 0

      this. plus lots of weird booming room acoustics

      plus foreigners butchering the language. and out of the ones who can be understood, a good 1/5 didnt actually say the exact words required

      good luck with that!

    7. Re:Corpus Quality by ChunderDownunder · · Score: 1

      foreigners butchering the language

      Yeah I was unsure what to make of that. Perhaps they wanted a sample of real-world usages but I rejected some that would make phonological errors that a first language speaker would never make - e.g. one guy sounded like he was Dutch or Scandinavian. It was understandable in the context but not 'English'. Another was a guy slurring accentedly (Asian) through syllables without enunciating the word as recognisable.

      So on the one hand, recognising 'proper' English vs comprehending something a second language speaker might say in the wild. And I'm conscious of this in that, thanks to English rapidly becoming the world language of the 21th C, first language speakers of English will be outnumbered perhaps 10-to-1 within 50 years (was 3-to-1 in 2003 according to wikipedia).

      And there's no way of giving feedback to the speaker as to why their input was rejected.

    8. Re: Corpus Quality by Anonymous Coward · · Score: 0

      Optical microphones are better. Digital ones are so 2010.

    9. Re:Corpus Quality by TuringTest · · Score: 1

      Please, please don't reject samples merely because the speaker doesn't speak with "native English" accent, if the words spoken are accurate.

        A lot of the appeal with a project like this is in making voice recognition available to people around the world. Typically voice recognition works like shit for us who were born outside English-speaking countries, because they're only trained with native people.

      --
      Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
    10. Re:Corpus Quality by Maxo-Texas · · Score: 1

      When you set up an account (as I did), you specify what region and type of english you are speaking.

      --
      She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
    11. Re:Corpus Quality by Anonymous Coward · · Score: 0

      lol, soooo true, that it actually becomes quite depressing to think about.

    12. Re:Corpus Quality by exomondo · · Score: 1

      The are already open source projects doing this, in fact they are even linked in the summary. VoxForge already exists so instead of yet another NIH syndrome project why not work together to improve VoxForge?

  4. a simple toolkit this time by Anonymous Coward · · Score: 0

    Sphinx is a 1000 rolls of ducktape holding together coughed up hairballs. Can we make the next one less convoluted? Avoid java and any other hipster languages. You know what? Just stick to C. Not even C++.

    1. Re:a simple toolkit this time by Anonymous Coward · · Score: 0

      PocketSphinx?

  5. Sneakers by Anonymous Coward · · Score: 0

    Please say, "My voice is my passport."

  6. Also good for spoofing voice based auth by Anonymous Coward · · Score: 0

    Seriously, how many people will do this and then happily use voice recognition to decrypt hard drives, get into email accounts useful for resetting their bank password, etc? Just dumb

    1. Re: Also good for spoofing voice based auth by Zero__Kelvin · · Score: 1

      If there was a "+1 Tried really hard to sound insightful" I'd mod you up.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    2. Re: Also good for spoofing voice based auth by Anonymous Coward · · Score: 0

      If you're good with some random people all over the world having access to your voice print, go for it!

    3. Re: Also good for spoofing voice based auth by Anonymous Coward · · Score: 0

      your a doosh

    4. Re: Also good for spoofing voice based auth by Anonymous Coward · · Score: 0

      Harsh... but fair!

    5. Re: Also good for spoofing voice based auth by Anonymous Coward · · Score: 0

      I agree with the "your a doosh " comment above because you really are a fuckin douche in general, but even more so since how the fuck are you modding with a comment made? Through an alt? Or are you as full of shit on this post as you are with every other security comment you make?

    6. Re: Also good for spoofing voice based auth by Zero__Kelvin · · Score: 1

      I didn't mod, but rather commented instead as I said. Allow me to laugh my ass off for a while that you are actually so stupid you couldn't figure that out.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
  7. Mozilla lacks focus by Anonymous Coward · · Score: 0

    Mozilla is not the top of their field in anything that they do because, without much money, they try to do everything.

    Mozilla: maybe try being the best at something rather than a "me too" at everything.

  8. Offline voice recognition by WaffleMonster · · Score: 2

    Thanks to Nuance voice recognition industry is effectively dead. If Mozilla can make this work in offline mode it would be awesome. Not requiring your every word to be recorded shipped off to third parties would be very useful.

    1. Re:Offline voice recognition by Anonymous Coward · · Score: 0

      Maybe one day when we have BoC (brain on a chip).

    2. Re:Offline voice recognition by Anonymous Coward · · Score: 1

      Hey, at least someone here acknowledges the actual market leader in speech technology: Nuance. I love how they always quote companies like Apple, Microsoft and Amazon. Actually I'm quite surprised they didn't mention Google or Facebook. But it's Nuance who has its speech and transcription services in 85% of US hospitals, in plenty of household equipment like Samsung TVs, the first generation of Siri (you know, of Apple that they mention here), most car systems ( BMW, Audi, Jaguar, Mercedes, Porsche, Volkswagen, Fiat, Peugeot, Citroen, Ford, ...), call centers, etc...

    3. Re:Offline voice recognition by Anonymous Coward · · Score: 0

      Who said anything about offline?! You mean like i.e. their Wifi-location-base that you are free to contibute, but can only so single queries.... Mozilla is no better then the rest of the Data-Vampires, they just don't pay taxes...

    4. Re:Offline voice recognition by Anonymous Coward · · Score: 0

      Welcome to 2017! You must be from the past. Turn on airplane mode on your Android device and try voice recognition. Notice how it still works!!! Offline voice recognition has been here for quite some time actually. (Now has anyone tried to hack Google's voice models out of Android? Not in any way that I have seen yet. Bonus points if you get the model running on a Raspberry Pi or something.)

    5. Re:Offline voice recognition by WaffleMonster · · Score: 2

      Welcome to 2017! You must be from the past. Turn on airplane mode on your Android device and try voice recognition. Notice how it still works!!!

      Running Google play services is simply not an option. There are literally no third party offline voice recognition apps available for Android without literally compiling your own from an open source library.

      Offline voice recognition has been here for quite some time actually.

      I used offline recognition on my old blackberry and windows mobile smartphone. It worked good enough for little I wanted it for (offline voice dialing and screwing with playlists)

      Today I find myself missing capabilities existing in devices I owned over a dozen years ago on devices 30x less capable than my current mobile.

      Now has anyone tried to hack Google's voice models out of Android?

      Google's TTS is quite nice, not locked down in any way and uses standard Android interface.

    6. Re:Offline voice recognition by WaffleMonster · · Score: 1

      Who said anything about offline?! You mean like i.e. their Wifi-location-base that you are free to contibute, but can only so single queries.... Mozilla is no better then the rest of the Data-Vampires, they just don't pay taxes...

      Nobody, there is zilch on Mozilla voice recognizer itself and an open question how it will work. The only bit of hope this would be available was inferred from their site:

      "People donate their voices to a massive database that will let anyone quickly and easily train voice-enabled apps. All voice data will be available to developers."

    7. Re:Offline voice recognition by WaffleMonster · · Score: 1

      Hey, at least someone here acknowledges the actual market leader in speech technology: Nuance.

      TVs, the first generation of Siri (you know, of Apple that they mention here), most car systems ( BMW, Audi, Jaguar, Mercedes, Porsche, Volkswagen, Fiat, Peugeot, Citroen, Ford, ...), call centers

      Holy shit, my shill meter has gone to plaid.

      The only point I was making is Nuance is a terrible company. They either bought out or sued their competition to the point where there is no longer a functioning market leaving Nuance as a defacto monopoly. My remarks were never intended to assign praise or acknowledge the "greatness" of Nuance.

      I strongly believe the world would be in a much better place in terms of current commercially available voice recognition capabilities had Nuance never existed.

    8. Re:Offline voice recognition by Anonymous Coward · · Score: 0

      > Google's TTS is quite nice, not locked down in any way and uses standard Android interface.

      What about STT? That's voice recognition, not synthesis.

    9. Re:Offline voice recognition by Anonymous Coward · · Score: 0

      > Google's TTS is quite nice, not locked down in any way and uses standard Android interface.

      Is it in AOSP?

    10. Re:Offline voice recognition by Anonymous Coward · · Score: 0

      You could seriously look for yourself.

    11. Re:Offline voice recognition by Anonymous Coward · · Score: 0

      I get that you don't like them, but at least you know who they are. Talking about big Speech Technology companies without mentioning Nuance is like talking about Social Network Sites without mentioning Facebook or Web Search Engines without mentioning Google.
      In any sense, they are just doing what all the big players are doing out there. Look at Google, Facebook and Microsoft. They bought all emerging social networks, or successful Applications.
      Microsoft bought Skype, Yammer, Nokia and LinkedIn
      Facebook bought WhatsApp, Oculus and Instagram.
      Yahoo bought Tumblr.
      Google bought Kaggle, Motorola, YouTube, ...
      Pretty much all the services you once started using to escape one if the big players got assimilated at one time.

    12. Re:Offline voice recognition by Anonymous Coward · · Score: 0

      Indeed, this is quite a... subtle... argument you make.

    13. Re:Offline voice recognition by Anonymous Coward · · Score: 0

      I'd hope that facilitating that was the underlying thought process, it sounds very Mozilla concept. Their gradual fall out with Google which led to Chrome can be traced in part to the much-derided (including by me) "AwesomeBar" which moved searching for previously-seen material client-side, i.e. out of Google's sight.

  9. Re:Great. Another thing for them to fail at. by Anonymous Coward · · Score: 0

    If they hired creimer he could clean out their storage closet while they go bankrupt. He is a miracle worker after all.

  10. Just what we fucking need by Anonymous Coward · · Score: 0

    It's bad enough that Amazon, Google, the govt. etc are capturing all our spoken utterances with voice recognition tech. Do we really want to put it in the hands of every dipwad spammer and malware botnet?

  11. Why is Mozilla doing this? by Anonymous Coward · · Score: 0

    What does this have to do with building a web browser?

    1. Re:Why is Mozilla doing this? by Anonymous Coward · · Score: 1

      I'd say everything for the future. Google Home and Alexa are the new web browsers. The web browser is growing beyond its traditional interface to become a full-blown virtual secretary. It is getting to the point where it drives me nuts that I keep having to go to the keyboard when using the browser on my PC instead of just ask like I do with the assistant on my phone.

      And I for one always saw this day as one in which the assistant would be running on my machine, not on some cloud server. The implications of having this extension of my mind which will likely in short order reach the ability to act as an autonomous proxy representation of my desires / will residing outside of my home are huge.

      Any effort to get assistant level AI to run on local resources from open source is a good effort.

    2. Re:Why is Mozilla doing this? by yuvcifjt · · Score: 1

      Parent should be modded-up.

  12. Re:a simple toolkit - TiESR or Kaldi? by coarticulation · · Score: 1
    I think any near state of the art recognizer is going to be pretty complicated, because the algorithms are not simple. On the other hand you're talking about complicated math turned into code by people who are scientists instead of professional programmers.

    At one extreme, TiESR https://gforge.ti.com/gf/proje... is a fairly simple to use. Not state of the art, but it does use Hidden Markov Models (HMM's) and has some noise compensation built in. It comes with word and language models, so it's fairly easy to use - for US English at least. I haven't been ambitious enough to figure out how to build new models.

    At the other extreme, Kaldi http://kaldi-asr.org/ is the most advanced open source recognizer that I'm aware of. Neural Nets and all the other goodies researchers have been working on the last few years. Definitely not easy to compile or use, though. And don't even think about trying to design a neural net without a graphics card to use as a math accelerator: one of the examples ran for days and wasn't even close to finishing when I gave up.

    Anybody else have suggestions for another toolkit?

  13. My voice is my passport by Anonymous Coward · · Score: 0

    and never computer date! (honeypots!)

  14. Hi, my name is Werner Brandes. by Anonymous Coward · · Score: 0

    My voice is my passport.

    Verify Me.

  15. Try emailing 'surname@gmail.au' by BeCre8iv · · Score: 1

    In soviet Russia the domain extension autopilots you.

    --
    This perpetual motion machine Lisa made is a joke, it just keeps getting faster and faster. - Homer
  16. NSA Here... by Anonymous Coward · · Score: 0

    We are building a new voice print database and would like to have a chat.

  17. My voice is my passport. Verify Me. by Anonymous Coward · · Score: 0

    https://www.youtube.com/watch?v=-zVgWpVXb64 :D