Ask Slashdot: Who's Building The Open Source Version of Siri? (upon2020.com)
We're moving to a world of voice interactions processed by AI. Now Long-time Slashdot reader jernst asks, "Will we ever be able to do that without going through somebody's proprietary silo like Amazon's or Apple's?"
A decade ago, we in the free and open-source community could build our own versions of pretty much any proprietary software system out there, and we did... But is this still true...? Where are the free and/or open-source versions of Siri, Alexa and so forth?
The trouble, of course, is not so much the code, but in the training. The best speech recognition code isn't going to be competitive unless it has been trained with about as many millions of hours of example speech as the closed engines from Apple, Google and so forth have been. How can we do that? The same problem exists with AI. There's plenty of open-source AI code, but how good is it unless it gets training and retraining with gigantic data sets?
And even with that data, Siri gets trained with a massive farm of GPUs running 24/7 -- but how can the open source community replicate that? "Who has a plan, and where can I sign up to it?" asks jernst. So leave your best answers in the comments. Who's building the open source version of Siri?
The trouble, of course, is not so much the code, but in the training. The best speech recognition code isn't going to be competitive unless it has been trained with about as many millions of hours of example speech as the closed engines from Apple, Google and so forth have been. How can we do that? The same problem exists with AI. There's plenty of open-source AI code, but how good is it unless it gets training and retraining with gigantic data sets?
And even with that data, Siri gets trained with a massive farm of GPUs running 24/7 -- but how can the open source community replicate that? "Who has a plan, and where can I sign up to it?" asks jernst. So leave your best answers in the comments. Who's building the open source version of Siri?
Who gives a fuck?
Have the government operate a open cloud voice platform in a public datacenter db that anyone can utilize. We'll call it the voice of God and tell people to route their prayers to the NSA because they watch over all we do.
No? Not to worry. Someone will. No?
Does anyone actually know a programmer who want such a thing? As a developer who has never used this feature on his phone, I'm not very inspired to contribute to such a project. I'd be much more likely to work on projects that help improve security and isolation and specially break such services.
Siri sucks ass. The correct question is who is building the open source copy of Google'search voice recognition?
There is no code to copy and build upon. Without copies, there is no copyright law to enforce openness. Consider a world without the GPL. Now consider it without piracy as well. Welcome to SaaS.
Sirius (Ubuntu only I believe):
http://sirius.clarity-lab.org/sirius/
I can barely hear it over the moaning from the other room in the background. Oh who am I kidding... I'm only listening to the moaning in the other room.
When you talk about the 'massive farm of GPUs' running 24/7 you ignore the fact that, because it is proprietary they are missing out on the potential compute resources out there.
How many people have run SETI@home, or gene folding efforts. We just need someone insightful and ingenious to find a way to deal with machine learning in an 'offline' way, and be able to present the user interface in a quick fashion.
It would have to start out very dumb, but with some great key algorithms I expect an open source option could move a lot faster than anything out there in this regard.
Honestly, the only way that I see this happening is if Google decides to make their AI interface open source. Which they might do as a public service -- but we're still playing in Google's sandbox.
Unless there's some way to get geeks to contribute their unused CPU cycles, like what SETI was doing...
First, I'm sure there's lots of Open Source being used in Google's implementation - just not where we can see.
There is a speech recognizer from CMU that might be a good starting point. I haven't heard about plain-language software, though. There is additional rocket science to be done. Not insurmountable given things we've already done.
Training with millions of people? Actually, that's the part that community development is good at.
Bruce Perens.
OK, you might not listen to the Linux Action Show or similar podcasts, but come on... google "open source AI" before asking.
Do you not realize that Siri must utilize a significant backend resource at the other end of a data connection to be effective, and that the backend requires substantial resources to operate and maintain? Siri is not some standalone app you download and forget. An open source equivalent would not be free, and would require a Kickstart and guaranteed subscriptions to be feasible. I don't think the world is quite ready for such a thing yet, not in a country populated by people willing to vote for either Trump or Clinton.
The answer is most likely "no one." Those of us in the open source community are looking in a completely different direction from the mainstream; that's the whole point. Features like a voice-activated virtual assistants really only appeal to the lowest common denominator of computer and smart tech users--people who don't understand tech (or want to) and thus just want something incredibly simple that works without much effort. The same people who covet something like Siri are the same who sped 99% of their time interacting with Facebeast or the Tweeters.
I'd much rather the open source community focus on important things like security, compatibility, efficiency, and workarounds for dick companies like Lenovo locking down their hardware.
so this shit technology will not be created for free anytime soon. its current existence is only to serve data mining corporations. what is the use case for an individual there?
It is in development..
http://jasperproject.github.io/documentation/
Not affiliated with the project.. saw it sometime ago.. decided to wait till it further matures...
AFAIK Google's isn't open source, but I don't think anyone is paying directly for it. I recently bought a $28US phone that came with the android OS. This came installed on the device (it's built into maps), and it works very very well. So I think the question is worthless to answer.
Sure, if there's an open source option, then the world can rest assured to be able to tinker with it themselves and that. And yes, Google could pull the plug on it. But for some reason, I feel that Google would just release it to the public before they'd simply toss out all that development. The task of building such a database of info, mixed with the ever-changing roadway of each country... no way anyone else, besides some huge corporate entity, could ever start from scratch. And even if they did, what would be the reasoning behind anyone using it, rather than Google's?
Politics; n. : A religion whereby man is god.
Who gets to teach the AI, and who gets to determine what it's taught?
Not having these things determined by an entity that can be regulated or at least spanked could be a really bad thing. Then again, maybe Google wants to take over Earth.
Could that be so bad?
~ People that think they are better than anyone else for any reason are the cause of all the strife in the world.
We barely have time to work on *useful* things. Screw that cyberpunk bullshit.
Correct link
http://jasperproject.github.io/
but have you tried asking siri?
This was the kickstarter: https://www.kickstarter.com/pr... Their main community website is: https://community.mycroft.ai/ They also have a slack here: https://mycroftai.slack.com/me...
No one has mentioned Mycroft yet?
There are a few application areas that are specialized and difficult enough that it they may not be doable within the Free Software paradigm. Richard Stallman himself, for instance, was not able to explain to me how you could get the right specialized engineers together to develop a free equivalent to Synopsys design compiler. Enthusiasts in this area don’t tend to be interested in writing software as a hobby, so you’d have to hire engineers, which means you have to pay for all the development.
With automatic speech recognition, it’s not just an AI problem. You need massive labeled datasets that cost money to acquire, and the experts who really know this stuff are moving to on to their next research project. So how are you going to get engineers to learn and implement the esoteric techniques used here? You’d have to pay them. Most people who would be interested in writing free software to do this just don’t know the subject area well enough.
Ones that even beat the proprietary competitors too, see http://tests.stockfishchess.or.... This is not to mention efforts like folding@home and similar. Of course there is still the problem of having large training data sets.
Look at the in ability of wikis to gain reliable information without edit warriors and revert bots. Do you really want people like that "programming" your artificial intelligence. Like it or not proprietary is more reliable, generates billions in revenue creating jobs unlike the mostly unemployed wiki volunteers. Open source siri is the biggest joke since "open source" money like bitcoin.
SETI@home is old as hell, so the idea of "open source" render farms is at least as old. Those "massive farms of GPUs running 24/7" don't scare me at all. In fact, both Siri and Google's voice recognition kinda suck. When they try to control us with this, or it is revealed that they send all their data directly to the government, I suppose we will have an incentive to act. Otherwise, wake me up when they do something interesting and new.
While it's difficult to built up a single all-inclusive solution, some individual parts are not that hard: ...)
- speech recognition (incl. grammar and content) is something quite a few smaller companies can do
- specialiced functions are relatively easy to realize (home automation, car navigation,
In my opinion, the training is not the hardest part - the biggest issue is the business model: Running such a server farm is expensive, and consumer typically don't want to pay for it directly - so the only path is indirect revenue (marketing ...). I doubt that would be much different with an open source implementation - someone still needs to keep the servers running.
IF (big if) the smaller players in the market that do not like to share their know-how with google agree upon a standardized format for queries and data handling, a decentralized solution might be an option. It would require more effort by the consumer (configure multiple providers, maybe require a keyword to switch) - but it would also offer more freedom.
Regarding the complete solution, it's also possible that one of the big players loses on the market and decides to open-source the software. But as written above that still doesn't solve the issue of providing the servers and the business model behind it.
The Mozilla project Vaani is intended to fill exactly this niche. https://wiki.mozilla.org/Vaani
It's semantic recognition. Like what "it" in the prior sentence means -- in this case it's mainly a grammatical placeholder, but note how the various uses of "it" in *this* sentence are different.
The really impressive thing about Siri is how well (although still not human-well) it divines intent, not just phonemes. Add to that a massive scale attempt to get the phonetic recognition part right, and it's a bit like trying to launch a competitor to Google Maps.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Thanks for asking the question. I didn't know about Mycroft until I looked for an Intelligent Personal Assistant.
"A person is smart. People are dumb, panicky dangerous animals and you know it." - K
Not yet mentioned yet is http://lucida.ai/ -- it's the successor to Sirius, and where all the ongoing development is focused.
Major options that are mentioned elsewhere in the thread:
https://mycroft.ai/ (One of the most advanced,can actually be used in a pretty useful manner now, but sends snippets to Google for voice recognition--they intend to change that eventually, and they don't have a full-time open mic. Plus they aggregate audio across users so it's less identifiable as from a single source).
https://wiki.mozilla.org/Vaani (from the Mozilla project; supposed to enter beta this month according to that page)
rage, rage against the dying of the light
Google's cloud speech API is a paid service - what does it have to do with this topic?
Open Source Siri always responds with 'RTFM, noob'. Should be pretty easy.
Yes, this joke has been brought to you by the year 2005.
The thing is, this really is not an open source software issue, it is more of an infrastrcuture issue. People can make the code that will handle spoken queries and return answers and do it as a community. That's not really the tricky part. What the OP is looking for though is a massive project of which code is a small part. There is voice processing, servers to maintain, lots of fine-tuning and learning to do, if we want the assistent to speak then we need voice actors, etc. Plus hours and hours of testing and trials and putting it all in an interface people will like.
This reminds me of the "Where is the open source Facebook?" question. There are plenty of open source social network frameworks, but the code is a small part of the job. There's a massive amount of servers, advertising and social engagement that would need to happen for someone to make a new Facebook alternative. The open source code is there, it's the other parts which are missing.
The author also seems to think most commercial software up to this point has an open equvalent, but it doesn't. Geological, accounting, mapping and tax software tends to be commercial only. There are usually no open source alternatives because it's not something you can throw together and just publish on-line. You need auditors and geologists, accountants and so on to make these things work. It's not a coding problem so much as a business/product problem.
You don't think Apple are working on it?
iPhone 7 removed the headphone jack. iPhone 11 will remove the internal microphone.
From your link:
TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them...
That doesn't sound like an open source version of Siri to me...
I see that Areyoukiddingme is too busy bitching about SJWs and global warming to read their own links anymore.
http://taixzo.com/saera/saera.html
It's pointless to talk about creating an open-source version of Siri or Alexa unless you can explain how you're going to also create and maintain the server-side infrastructure needed to make it work. The Siri and Alexa interfaces may run on a client, but they're brain-dead without the server farms of Apple and Amazon behind them.
A similar example from the not-too-distant past: Aaron Swartz's download of a significant chunk of the JSTOR database. Those JSTOR articles wanted to be free, right? And they were set free - copies of Swartz's JSTOR download were available in a multi-GB torrent on several sites. Swartz's entire rationale was that those articles should be freely available to everyone.
So where is the free, open-source version of JSTOR today? It doesn't exist, because building and maintaining a server-side infrastructure that makes that database useable costs money ... which, of course, is why JSTOR required a subscription fee.
Solve out the server-side economics, and you have a shot at building an open-source Siri. Until then, you're better off putting your open-source efforts into client-side applications.
Siri is a complete stack of text reco engines, intent recognition tools, and backends. There are many initiatives like Sirius, Mycroft and YodaQA, and each does something slightly different - either focusing on the speech reco infrastructure, or just answering factoid questions...
It's not the fall that kills you. It's the sudden stop at the end. -Douglas Adams
Not to be a negative nellie, but the way things have been going with patent/IP trollsuits going on over the last few years... Anyone betting against this sort of thing happening on a large/usable/popular scale?
Google's cloud speech API is a paid service
It was completely free when it was announced. It's still completely free for the first 60 minutes of recognition time.
And I only mentioned it because somebody asked if there's an API. There is.
Why not use Captcha as a way to get millions of VR samples? It can be an option along side typing in an answer in case a mic isn't handy/convenient.
I'd like to interject for a moment. What you’re referring to as TensorFlow, is in fact, "corporate data/TensorFlow", or as I’ve recently taken to calling it, corporate data plus TensorFlow. TensorFlow is not a finished voice assistant itself, but rather a free component of an otherwise proprietary, fully functioning google system made useful by the google proprietary APIs, apps and web services comprising a full product experience as defined by the google leadership.
Many computer users query the google system every day. There really is a TensorFlow, and these people are using it, but it is just a part of the system they use.
TensorFlow is the basis: the program in the system that executes the AI programs. TensorFlow is an essential part of an AI service, but useless by itself; it can only function in the context of a complete AI service. TensorFlow is normally used with the google proprietary service: the whole system is basically google's proprietary service with TensorFlow added, or "corporate data/TensorFlow".
The resource based technical issues can be overcome by a distributed approach. With thousands of participants, you can wield thousands of CPUs and thousands of people to create data sets for training.
The runtime issues can be mostly overcome in the same way though your performance (speed, accuracy or both depending on how you adjust tradeoffs) will always lag an approach using colocated CPUs.
The magic ingredient is, therefore, some charismatic leader or group to drive the project. Find a geek that actually knows how to market a project and you're off to the races.
https://www.youtube.com/watch?...
by TheSpoom (715771) Uncaring Linux user here. I have nothing to add to this but please continue. *munches popcorn*
Train it using broadcast TV and Closed Caption. Spoken word and text. Free and abundant and contains the type of data needed.
It only needs two features. First is to keep cutting people off mid sentence. If you are trying to say, "Send message to John Smith." I can have it cut people off before the name John Smith.
.net and getting me to become a sharepoint/MS salesmen. But now things like Visual Studio allow me to program for my Android and iOS just slick as can be. They are tools that work for me.
Then I can randomly have it just wait until the end and then say, "I can't find that person in your contacts, would you like me to search the local area for businesses of that name?" This is regardless of what their actual command was.
What I find interesting about Siri is that it so rarely gets what I am saying correct but when I insult it, it has got that right 100% of the time. "Fuck you Siri, you useless pile of shit." or any one of the zillion creative insults that I have thrown at it have resulted in some "If I had feelings, they would be hurt." So I know that it is not my microphone. It is the pile of crap just not getting what I am saying.
I am saying, "Call John Smith." or "Message John Smith" or "Read last message" or "Play audiobook, the John Smith Story."
I have a twenty minute ride home from work. I once spent the entire twenty minute ride home trying to send a message to someone that said, "I will be home in 20 minutes" (except that as I tried that number was ever growing smaller.)
Nearly the entire time it would just cut me off mid sentence. It would often be in the middle of my message. So it would end up saying "Would you like to send the message "I will"?" I was even trying to give it a run-on sentence such as IWillBeHomeIn20Minutes, so that it wouldn't pick up on a pause as the end. Then there is all the other bullshit that it sucks at. In the previous example it wouldn't confirm to whom I was sending the message. It would not allow me to change the message. So I started over and over just to see if I could get it to work. Yet as a confirmation that it was hearing me I would ask things like, "What is the second derivative of x^3+x^2+3x+9" and it would give me the correct answer.
Then after the map program nearly continuously putting me blocks from where I really am and thus giving me terrible directions in critical situations and then trying android's siri awesome equivelant, I switched to android.
On this note, I don't think that apple realizes how bad these missteps are getting. The fact that it took me 20 minutes to send no messages, the fact that it took me 20 minutes to remove that U2 bullshit from my phone, the fact that I can't remove BS apps from my phone, the fact that iTunes nearly always is jumping to music and movies (both on the phone and the desktop) when I am clearly not looking for either (such as when I am looking for a podcast). The fact that my mac pro(not macbook but my $6,000 dollar mac pro) is shoving iCloud down my throat. The fact that I can't repair half of this shit without using magic tools. The fact that little things like some extra memory costs about as much as a cheap version of the same device. All totals up to my typing this on a completely kick ass windows desktop that is presently charging my completely kick ass huge screened Android phone that I rooted and easily removed all the BS from.
While I am seemingly a single customer, I am also in charge of the purchasing for a large company. A company where I switched many of the execs and programmers to Apple. A switch that I am now reversing. Do I hate apple? Nope. The key is that Apple is no longer working for me, the devices that I bought weren't my servants, but little apple salesmen. Then there are things like XCode that was no longer really encouraging me to do things as a professional programmer, but trying to lock me into the apple ecosystem. Oddly enough this is why I originally left windows and microsoft. It was all about
Can you imagine a carpenter who got a hammer that would only hammer mastercraft nails? Or a hammer that regularly missed the nail regardless of your skill with a hammer?
You even build a working model to sequence bits, it is just a waveform representation.
Then you try to talk to normal people and they think you are crazy for trying. Then you try to talk to programmer type people and they are elitist pricks who act like it is some amazing technology that is more than pattern replication because they want to sell it with hype to the idiots.
The strange bit is why the idiots have the money. Somebody must have fucked up somewhere.
timholman's post is incredibly insightful. To get around the problem he point out, I think we need to distribute these services to the community, as the OP suggests. The TelCo's make this difficult, with restrictive terms of service. A cloud powered by millions of home users is probably the technical solution to the economic problem, but to implement it we'll need to free the fibre.
Think around, not through. What we want is efficient, intuitive and reliable human computer communication. If voice recognition is that hard, with many facepalm inducing errors, it is a stupid way to go. It is easier for humans to adapt to the machine. This means artificial dialects and simple AI and a bit of human training. Human consumers are lazy and want magic. Apple and MS try to grab them with the illusion of magic. It would be better for the free software to research what changes to speaking habits make the software component easier, then write howtos and youtube guides as to how to speak to it.
John_Chalisque
" either focusing on the speech reco infrastructure, or just answering factoid questions..."
So it's like me, when I speak, I can't listen.
The trouble is, unlike software development which is free (if you don't value your time), implementing an open source siri would require a data center fill with servers and this costs money. The fundamental problem is software development creates value while an open source siri is a cost center. Wikipedia would probably be a good candidate to pick up this task because they are already familiar with the open source cost center model, they are a knowledge database, and they already have the server infrastructure.
The answer is MyCroft
I plan on buying one of these the very soonest I can once they are actually shipping the hardware. Echo is crippled by the many limitations Amazon coded in on purpose -- it's basically something that looks up text matches and does something if it finds one. No language parsing worth a damn. Even so, it's very useful, and within those limits, you can make stuff for it, Amazon's pretty open about it as long as you can set up a secure server (ugh) or use their cloud (double-ugh.) Siri, as per usual for Apple, is a much more closed system, and frankly, it's of no interest at all to me because of that.
Mycroft is completely open source. I have very high hopes for it because of that. I have reams of my own natural language processing code I should be able to plug right in the moment there is a speech-to-text engine I can use directly. Others do as well. Custom apps in the home space, that are actually somewhat smarter than...
[if string == "turn on light" then TurnOnLight]
I suggest everyone check MyCroft out. Perhaps you'll be as enthused as I. I can hope. ;)
I've fallen off your lawn, and I can't get up.
Whoever is designing such a system needs to remember to keep it client-side.
Given the ridiculous amount of processing power available on even low-end phones and tablets now there's really no excuse to rely on the horrible latency and dependence that comes with server based voice recognition.
Any voice processing that relies on server-side processing has already failed.
"Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
Well, we'll need a voice to text generator. Then we'll need some kind of AIML handler. Finally, a text to voice generator. IBM use to sell a Voice-To-Text interface card in the late 1990's. Text to Voice is a small software routine these days.
The Machine Learning part is the intriguing part. Books have been, and will continue to be written on this. What the hard part is, "Is how can a computer program find a valid fact, and be able to defend that the fact is valid?"
There are currently 50 different implementations and none are compatible, as is the OS way.
The interested in open source speech recognition used to be much higher. When Google and others released their free, cloud-based speech recognition APIs everyone jumped over to those. It became difficult to find projects based on the older FOSS software. There are a few products out there but they are an absolute pain to installed because they're designed by researchers. You have to know all the field-specific terms and concepts as there is no one-click installation package. You need to install and configure each component to work together: audio input, sound analysis , sound metadata -> words, words -> phrases, and finally phrases -> DIY action software and there aren't any good tutorials on any of those steps. When you do find some tutorials their packages are out of date so you end up trying to manually compile everything. It's simply not worth it when Google has an easy to use API. Common application developers don't care much about FOSS or privacy, they just want to make their cool little program. If it takes 160 to get the FOSS version working or 7 for Google, they all pick Google. And then Google gets all this free voice data as well.
Does anyone know of something along these lines that can run without an internet connection?
Something that you could ask the status of a gpio pin, state values, or even ask it to tell you a joke (from a predefined list)?
Free for 60 minutes is not free. Only free is "Free".
Time is what keeps everything from happening all at once.
Short answer no: The voice model and is to large to conveniently run locally. Any data needed to formulate an answer (prices of products, driving instructions, jokes, baseball scores, count down timers, etc) has to be accessible to the voice computer.
Architectural plans are like computer source code with a couple of differences: You only compile once.
I assume you are referring to the nebulous "Amazon is listening" issue. For amazon to be listening all the time would consume huge amounts of bandwidth and processing power for it to be of any use to them. And you the end user would see the bandwidth pull all the time.
Architectural plans are like computer source code with a couple of differences: You only compile once.
I am an Android user, and have used Google Now, but had not tried Siri until very recently, when it was bundled with the latest macOS Sierra release. So far, I have been less than impressed with both Google Now and Siri, and after trying Siri for three days on my computer, I turned off that functionality altogether, because it was not as helpful as I had expected a voice interface to be. So, I would like to know who's building a better, open source voice interface (as opposed to merely recreating Siri or Google Now, which are both mediocre at best)?
Mycroft.ai
We're moving to a world of voice interactions processed by AI.
No, we're not; I don't believe voice interactions will ever become dominant mode of interaction with electronics, except maybe to control your living room TV, or for people with disabilities. Even if you can produce perfect recognition of every human sound in a given language, you still have the problems that: (1) people use electronics in public, and don't want to speak out loud for privacy reasons; (2) using a keyboard or touch interface is much faster and efficient than talking; (3) most speakers of e.g. English in the world are not native speakers, meaning that they'll have an accent, and it's very hard for electronics to compensate for that; (4) all human languages are quite ambiguous, and depend on a lot of context and culture that is hard for a non-human to "get"; (5) talking to a computer all day is much more tiring than using other interfaces — there's a reason many people prefer email to phone calls in business settings. And the list goes on.
I imagine Siri is much more useful on a phone than on a standard laptop or desktop system. I use it all the time on my phone, but I don't expect it to be at all useful on my mini.
Being able to ask "how many tablespoons in half a cup?" and get a spoken answer, is really useful, especially if I'm in the middle of cooking at the time.
Isn't open source Siri / Coratana = UTAU?
We are? I honestly haven't noticed.
Change is certain; progress is not obligatory.
"And even with that data, Siri gets trained with a massive farm of GPUs running 24/7 -- but how can the open source community replicate that?"
But only recently have some computers (mostly phones) started listening. Unfortunately, the current commercial interfaces have a bit of trouble when more than half of the words have 4 letters that refer to things that a machine cannot experience and natural functions they cannot perform.