Slashdot Mirror


Mozilla Updates Common Voice Dataset With 1,400 Hours of Speech Across 18 Languages (venturebeat.com)

Mozilla wants to make it easier for startups, researchers, and hobbyists to build voice-enabled apps, services, and devices. From a report: Toward that end, it's today releasing the latest version of Common Voice, its open source collection of transcribed voice data that now comprises over 1,400 hours of voice samples from 42,000 contributors across 18 languages, including English, French, German, Dutch, Hakha-Chin, Esperanto, Farsi, Basque, Spanish, Mandarin Chinese, Welsh, and Kabyle. It's one of the largest multi-language dataset of its kind, Mozilla claims -- substantially larger than the Common Voice corpus it made publicly available eight months ago, which contained 500 hours (400,000 recordings) from 20,000 volunteers in English -- and the corpus will soon grow larger still. The organization says that data collection efforts in 70 languages are actively underway via the Common Voice website and mobile apps.

13 comments

  1. That's 1h17 per language other than English by Anonymous Coward · · Score: 0

    Versus 500 hours in English.

    Quite some work to do still. Also a proper British English contribution would be nice. Can't stand 'merkin, that oughtn't count as English. Most of the hours will be silly valley Californian to boot. Not as bad as southern drawl but it still shouldn't be taken as representative for "English".

    1. Re: That's 1h17 per language other than English by Anonymous Coward · · Score: 0

      Don't be so bigoted--It is not the world's fault that your 'Merikan daddy didn't love you.

  2. Here comes deepfake voice! by Anonymous Coward · · Score: 0

    This is the missing piece needed for deepfake voice generating software. Just in time for election season.

    1. Re:Here comes deepfake voice! by Anonymous Coward · · Score: 0

      Just follow this simple rule : If it sounds too dumb to be even plausible, Trump probably did say it.

    2. Re:Here comes deepfake voice! by Jason+Levine · · Score: 1

      Nefarious uses aside, I'd actually like to see low cost realistic text-to-voice software. I have a novel (link below if anyone's interested) and would love to release an audio book. Making an audio book is crazy expensive, though. It can cost around $2,000 or more. I don't have that kind of money to spend. I made a trial version of my upcoming book using text-to-speech software for my son who likes reading along with audio books. That sounded like a robot reading my book, though. I'd never charge anyone for that. I'd love to download "Deep Fake Audio Book Generator", run it against my text, and then upload the result to Audible and other sites. Even if it wasn't as good as a professional audio book reader, it could be good enough to charge a few bucks for.

      --
      My sci-fi novel, Ghost Thief, is now available from Amazon.com.
  3. Good! Something faster than DeepSpeech needed next by ffkom · · Score: 1

    Having a corpus of transcribed voice recordings available is indeed the most relevant prerequisite to implement decent speech recognizers. About Mozilla's own attempt on this, "DeepSpeech", I have so far heard disturbing things, like being painfully slower than "real time" even on utilizing a mid-range GPU. (And we talk of the recognition, not the training!)

    Back in the 1990s, our speech recognizers allowed "real time" recognition on a Pentium-133MHz. Admittedly with probably a smaller vocabulary and a higher error rate than DeepSpeech, but we talk about an insanely high factor of more computational power here being consumed by DeepSpech.

  4. Re:Good! Something faster than DeepSpeech needed n by rtb61 · · Score: 1

    You only need one voice, the voice of the current user. In reality too many voices will entirely screw up voice recognition because there is a lot of word overlap in the way words are pronounced to sound like other words when other people pronounce them. So too many voice will create more problems than they solved, they just have to learn to accept voice training of devices but they don't want that because then voice commands would be localised, rather than broadcast back to home base to be recorded and data mined forever.

    --
    Chaos - everything, everywhere, everywhen
  5. Check carefully by knorthern+knight · · Score: 1

    Obligatory Monty Python https://www.youtube.com/watch?...

    --

    I'm not repeating myself
    I'm an X window user; I'm an ex-Windows user
  6. Re:Good! Something faster than DeepSpeech needed n by ImdatS · · Score: 4, Interesting

    If you need voice training data (specifically also for Speech Synthesis) my former company created a dataset that I am now making available on my website. Since the company is closed (and I used to be the CTO/MD), I had decided to release it into the public with a BSD 3-clause license.

    Here is the link: https://www.caito.de/2019/01/t... (M-AILABS Speech Dataset).

    It contains German (237hrs), Queen's English (45h), US-English (102h), Spanish (108h), Italian (127h), Ukrainian (87h), Russian (46h), Polish (53h) and French (190h).

    All details about structure and how to use it is on the website.

    Have fun.

  7. Re:How about working on Firefox instead? by Anonymous Coward · · Score: 0

    idiot

  8. Why not leave data mining to Facebook and Google? by Anonymous Coward · · Score: 0

    Google and Facebook could donate daily speech samples of billions of their users. Mozilla should concentrate on fixing the bugs in browser instead of playing with projects like this.

  9. I really like the fact that there are so many very useful online services these days. Especially I like dating websites. And if you've never used them, you just should view site here and check out how the thing works. I am sure that you'll like it.