Baidu Releases Open Source Artificial Intelligence Code (thestack.com)
An anonymous reader writes: Chinese web services company Baidu has released a new artificial intelligence software called WARP-CTC. The code is apparently capable of speech recognition, particularly for short segments, that exceeds human capability. The source code uses an approach called 'connectionist temporal classification' and has been released on GitHub.
I've been developing automatic speech recognition systems in the 1990s. Back then, the best performing recognizers were based on Hidden Markov Models, and for "out of context" tasks like "determine whether an individual spoken word from an unknown speaker is 'nine' or 'none'", the automatic recognizers already achieved better recognition rates than humans. However, the specific human strength when recognizing fluent speech is to (a) quickly adapt to different speakers and (b) to fill in all the uncertain words from the understanding of the context, requiring "world knowledge". And that strength makes a very big difference. So the claim in the article is not really anything special, it is to be expected that computers are better than humans in this special task, for at least the last 20 years.
Now, can they please put this code to use on those terrible phone menus? With single words or phrases, like the article says? They can even have the advantage of AI-understandable context -- only a few responses match the question -- and yet they still get it horribly, hilariously wrong.
Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
I've been developing automatic speech recognition systems in the 1990s. Back then, the best performing recognizers were based on Hidden Markov Models, and for "out of context" tasks like "determine whether an individual spoken word from an unknown speaker is 'nine' or 'none'", the automatic recognizers already achieved better recognition rates than humans.
Sure... but this time they did it with Mandarin instead of English.
Anons need not reply. Questions end with a question mark.
Apparently this CTC technique was pioneered by the Swiss (IDSIA) who attempted to capitalize on it (Lifeware). Where the first impulse of the chinese researchers at Baidu was to post it on github.
To be fair, the Baidu researchers are located in Silicon Valley. So maybe this is just comparing socialists to capitalists.
Is it smart enough to automatically call the secret police if it hears the words "Falun Gong".
I had no idea they have been that good for that long. In the wild however, they suck. I hope I'm not coming off as super-critical. But it is the fault of humans for trying to make the machine pretend to be human.
... well, it is insulting. I don't want to be a hater. And the machine doesn't care that I'm a hater. And I don't hate the machine, I hate the humans behind it that make it pretend to be human.
Best:
I want to start this by saying that the following is decidedly NOT satire.
It is most probably the manner in which they are used that sucks the most. But maybe it is my own personal bias that is really to blame. Which is to say: I am prejudice against AI programmers.
When a robot answers the phone, I get miffed. I can't help it. When they ask [i]whatever[/i] I answer [i]curtly[/i] because I think that it is beneath me to answer politely. And to be honest, I'm right! I'm not trying to start a class war here (okay, I said no satire but there it is).
My belabored point is that I DO.NOT.WANT a machine to simulate politeness; it takes a human to pretend correctly. A machine has no conception of remorse or any other emotion so to simulate a human is
Say the name of your product [beep] Say the issue with your product [beep]
And the voice that says that should not try to be passin'.
Where am I going with this? Let those that respond decide.
Take your pointless rambling nonsense somewhere else please, this is Slashdot.
Dang, this is Slashdot?!?! Never mind....carry on.
Automatic recognizers achieve better rates on this task - but they'll loose against you when it's complete, sensible sentences that are being spoken, even more so if you heard more sentences from the speaker, before.
For those who may be interested learning more about a certain subject, here's an unlinkified URL of a video that may or may not contain any interesting information. I'm not going to tell you anything about the contents, let alone an interesting summary.
Most of the current crap fails miserably with English - it may or may not work with American - I could not possibly comment.
Sent from my ASR33 using ASCII
The irony that this code is on GitHub is rich.
Not the only irony around here. When I grew up, everybody KNEW, they just KNEW, that those evil, Chinese Communists should open up their society, embrace Capitalism and take part in the international society, where we freedom-loving Westerners would welcome them and teach them how to think. So they did, and now they are being pummeled for that very thing, not least because they dare to make up teir own thinking. It's open source - you are supposed to be able to inspect the code, and in fact, one of the fundamental strong points of open source is that there will be hundreds of millions of eyes, all ogling at every last semicolon, right? And because this isn't just some hobby project, but a serious effort into something of commercial and academic interest, it will get a serious looking over by people who understand this kind of thing.
But if you don't like it, because it was thought up by somebody in China, then suit yourself. No-one's going to force you.
References please? I'm interested in this.
http://github.com/gbook/nidb