Ask Slashdot: Who's Building The Open Source Version of Siri? (upon2020.com)
We're moving to a world of voice interactions processed by AI. Now Long-time Slashdot reader jernst asks, "Will we ever be able to do that without going through somebody's proprietary silo like Amazon's or Apple's?"
A decade ago, we in the free and open-source community could build our own versions of pretty much any proprietary software system out there, and we did... But is this still true...? Where are the free and/or open-source versions of Siri, Alexa and so forth?
The trouble, of course, is not so much the code, but in the training. The best speech recognition code isn't going to be competitive unless it has been trained with about as many millions of hours of example speech as the closed engines from Apple, Google and so forth have been. How can we do that? The same problem exists with AI. There's plenty of open-source AI code, but how good is it unless it gets training and retraining with gigantic data sets?
And even with that data, Siri gets trained with a massive farm of GPUs running 24/7 -- but how can the open source community replicate that? "Who has a plan, and where can I sign up to it?" asks jernst. So leave your best answers in the comments. Who's building the open source version of Siri?
The trouble, of course, is not so much the code, but in the training. The best speech recognition code isn't going to be competitive unless it has been trained with about as many millions of hours of example speech as the closed engines from Apple, Google and so forth have been. How can we do that? The same problem exists with AI. There's plenty of open-source AI code, but how good is it unless it gets training and retraining with gigantic data sets?
And even with that data, Siri gets trained with a massive farm of GPUs running 24/7 -- but how can the open source community replicate that? "Who has a plan, and where can I sign up to it?" asks jernst. So leave your best answers in the comments. Who's building the open source version of Siri?
First, I'm sure there's lots of Open Source being used in Google's implementation - just not where we can see.
There is a speech recognizer from CMU that might be a good starting point. I haven't heard about plain-language software, though. There is additional rocket science to be done. Not insurmountable given things we've already done.
Training with millions of people? Actually, that's the part that community development is good at.
Bruce Perens.
The Mozilla project Vaani is intended to fill exactly this niche. https://wiki.mozilla.org/Vaani
>"Sirius (Ubuntu only I believe): http://sirius.clarity-lab.org/..."
Thankfully it doesn't appear to be related to or require Ubuntu at all.
timholman's post is incredibly insightful. To get around the problem he point out, I think we need to distribute these services to the community, as the OP suggests. The TelCo's make this difficult, with restrictive terms of service. A cloud powered by millions of home users is probably the technical solution to the economic problem, but to implement it we'll need to free the fibre.
It's harder than you think. Those older systems sucked, and couldn't handle natural language queries. The issue is not processing power, it's having a large enough volume of training material and mimicking how the brain fills in gaps.
Training material isn't just a case of gathering samples. When the machine makes a mistake, it needs to understand why. The collection needs careful curation and sorting to be useful. Such databases are extremely valuable, and historically with OS projects they often started with a donation from a commercial body rather than from scratch.
Mimicking the brain is also extremely hard. Often people don't hear things very clearly or in full, due to environmental noise, poor pronunciation and the like. To compensate the brain fills in the gaps or makes assumptions. People have been trying to program those assumptions into computers since the 1980s. Again, a database of that knowledge will be vast and valuable. Either you throw massive human resources at building it, or you crawl the web and look at trillions of search queries like Google does.
That's also why they need a cloud service to do this. The database is vast and proprietary, and querying it far from a trivial SQL command.
It's not just a programming or AI training problem, which is why no-one is doing it. The closest thing the OS world has is probably Open Street Map, but creating that data set was far less laborious and uninteresting than training a computer to have some common sense will be.
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC