Slashdot Mirror


Mozilla Updates Common Voice Dataset With 1,400 Hours of Speech Across 18 Languages (venturebeat.com)

Mozilla wants to make it easier for startups, researchers, and hobbyists to build voice-enabled apps, services, and devices. From a report: Toward that end, it's today releasing the latest version of Common Voice, its open source collection of transcribed voice data that now comprises over 1,400 hours of voice samples from 42,000 contributors across 18 languages, including English, French, German, Dutch, Hakha-Chin, Esperanto, Farsi, Basque, Spanish, Mandarin Chinese, Welsh, and Kabyle. It's one of the largest multi-language dataset of its kind, Mozilla claims -- substantially larger than the Common Voice corpus it made publicly available eight months ago, which contained 500 hours (400,000 recordings) from 20,000 volunteers in English -- and the corpus will soon grow larger still. The organization says that data collection efforts in 70 languages are actively underway via the Common Voice website and mobile apps.

1 of 13 comments (clear)

  1. How about working on Firefox instead? by drinkypoo · · Score: -1, Troll

    This is what happens when you give money to Mozilla. Instead of spending it to make the browser better, they spend it on unrelated projects, or blow tens of millions on things you don't want to even be part of the default install (i.e. Pocket.) Mozilla is the Wikipedia of web browsers.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"