Baidu Releases Open Source Artificial Intelligence Code (thestack.com)
An anonymous reader writes: Chinese web services company Baidu has released a new artificial intelligence software called WARP-CTC. The code is apparently capable of speech recognition, particularly for short segments, that exceeds human capability. The source code uses an approach called 'connectionist temporal classification' and has been released on GitHub.
Um why?
I've been developing automatic speech recognition systems in the 1990s. Back then, the best performing recognizers were based on Hidden Markov Models, and for "out of context" tasks like "determine whether an individual spoken word from an unknown speaker is 'nine' or 'none'", the automatic recognizers already achieved better recognition rates than humans. However, the specific human strength when recognizing fluent speech is to (a) quickly adapt to different speakers and (b) to fill in all the uncertain words from the understanding of the context, requiring "world knowledge". And that strength makes a very big difference. So the claim in the article is not really anything special, it is to be expected that computers are better than humans in this special task, for at least the last 20 years.
Now, can they please put this code to use on those terrible phone menus? With single words or phrases, like the article says? They can even have the advantage of AI-understandable context -- only a few responses match the question -- and yet they still get it horribly, hilariously wrong.
Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
Is it not blocked ATM?
Nope! After the Great Cannon incident and all the industrial espionage they've conducted, I'm not putting any code the Chinese government approves on any of my machines. Baidu is too close to the Chinese government to be trusted. I don't care if this project is open source. We've seen too many instances of malicious code slipping past open source audits.
The irony that this code is on GitHub is rich.
I've been developing automatic speech recognition systems in the 1990s. Back then, the best performing recognizers were based on Hidden Markov Models, and for "out of context" tasks like "determine whether an individual spoken word from an unknown speaker is 'nine' or 'none'", the automatic recognizers already achieved better recognition rates than humans.
Sure... but this time they did it with Mandarin instead of English.
Anons need not reply. Questions end with a question mark.
Apparently this CTC technique was pioneered by the Swiss (IDSIA) who attempted to capitalize on it (Lifeware). Where the first impulse of the chinese researchers at Baidu was to post it on github.
To be fair, the Baidu researchers are located in Silicon Valley. So maybe this is just comparing socialists to capitalists.
For those who haven't studied Mandarin:
https://www.youtube.com/watch?v=5TuVL3mlBR4
Is it smart enough to automatically call the secret police if it hears the words "Falun Gong".
So you're saying when they say "this call may be recorded for quality or training purposes" that's consent for training the voice recognition system?
How on earth can you exceed human capability for understanding speech? Foreign tongue ok fair enough otherwise huh.
I had no idea they have been that good for that long. In the wild however, they suck. I hope I'm not coming off as super-critical. But it is the fault of humans for trying to make the machine pretend to be human.
... well, it is insulting. I don't want to be a hater. And the machine doesn't care that I'm a hater. And I don't hate the machine, I hate the humans behind it that make it pretend to be human.
Best:
I want to start this by saying that the following is decidedly NOT satire.
It is most probably the manner in which they are used that sucks the most. But maybe it is my own personal bias that is really to blame. Which is to say: I am prejudice against AI programmers.
When a robot answers the phone, I get miffed. I can't help it. When they ask [i]whatever[/i] I answer [i]curtly[/i] because I think that it is beneath me to answer politely. And to be honest, I'm right! I'm not trying to start a class war here (okay, I said no satire but there it is).
My belabored point is that I DO.NOT.WANT a machine to simulate politeness; it takes a human to pretend correctly. A machine has no conception of remorse or any other emotion so to simulate a human is
Say the name of your product [beep] Say the issue with your product [beep]
And the voice that says that should not try to be passin'.
Where am I going with this? Let those that respond decide.
Take your pointless rambling nonsense somewhere else please, this is Slashdot.
Dang, this is Slashdot?!?! Never mind....carry on.
Automatic recognizers achieve better rates on this task - but they'll loose against you when it's complete, sensible sentences that are being spoken, even more so if you heard more sentences from the speaker, before.
For those who may be interested learning more about a certain subject, here's an unlinkified URL of a video that may or may not contain any interesting information. I'm not going to tell you anything about the contents, let alone an interesting summary.
Most of the current crap fails miserably with English - it may or may not work with American - I could not possibly comment.
Sent from my ASR33 using ASCII
I've been developing automatic speech recognition systems in the 1990s. Back then, the best performing recognizers were based on Hidden Markov Models, and for "out of context" tasks like "determine whether an individual spoken word from an unknown speaker is 'nine' or 'none'", the automatic recognizers already achieved better recognition rates than humans.
However, the specific human strength when recognizing fluent speech is to (a) quickly adapt to different speakers and (b) to fill in all the uncertain words from the understanding of the context, requiring "world knowledge". And that strength makes a very big difference.
So the claim in the article is not really anything special, it is to be expected that computers are better than humans in this special task, for at least the last 20 years.
Utter bullshit. HMM never achieved better than human performance. The new approach is based on DNNs.
https://www.youtube.com/watch?v=6TPMEoM-cjc
References please? I'm interested in this.
http://github.com/gbook/nidb