Microsoft Speech Recognition Now As Accurate As Professional Transcribers (techcrunch.com)
An anonymous reader quotes TechCrunch:
Microsoft announced today that its conversational speech recognition system has reached a 5.1% error rate, its lowest so far. This surpasses the 5.9% error rate reached last year by a group of researchers from Microsoft Artificial Intelligence and Research and puts its accuracy on par with professional human transcribers who have advantages like the ability to listen to text several times. Both studies transcribed recordings from the Switchboard corpus, a collection of about 2,400 telephone conversations that have been used by researchers to test speech recognition systems since the early 1990s. The new study was performed by a group of researchers at Microsoft AI and Research with the goal of achieving the same level of accuracy as a group of human transcribers who were able to listen to what they were transcribing several times, access its conversational context and work with other transcribers.
When a human transcriptionist makes a mistake you can usually work out what they meant. When Speech-to-text (STT) makes a mistake it is often gibberish. So objectively it is "better" at transcribing, but subjectively much worse.
You should start talking with people who don't speak gibberish.
Yeah, but Mumbai is on the phone with us again...
We have a up to date Microsoft service doing this at my work. Accuracy is a running joke and I regularly forward people their transcriptions so we all get a good laugh. This might be lab quality recordings with limitations on launguage complexity used to cut down on errors. Error rate of a closed set test isnt really a great indicator. Now a year long comparison against several call centers in multiple industries would be quite compelling.
Words don't make a language, and C does not become English just by using some English words.
Doing what you want is a completely different thing and would use a completely different algorithm, so at the very least it as rather off-topic to this article (mostly because things like phrases, grammar, context in general etc. don't apply, but are very important to creating a good natural language recognition).
You are being rather arrogant about it considering you very much didn't seem to understand the poster or why his criticism is valid.
3) How much background noise? Are these from people calling from cell phones. Or a LAN line.
Why does it matter? If it doesn't function in a standard operating environment then it isn't doing as claimed. What would you say to a watch maker who claimed their product was unscratchable but testing consisted of rubbing it with microfibre cloth?
3.... I've tried various voice recognition software over the years and can say they are getting much better but if there is any background noise forget it.
I quit trying to use siri because when I get in the car and ask siri for directions if my wife is with me I get siri saying "I couldn't find, 102 why the fuck street don't you type in the address like a regular shut up person damn it.