Slashdot Mirror


Microsoft Speech Recognition Now As Accurate As Professional Transcribers (techcrunch.com)

An anonymous reader quotes TechCrunch: Microsoft announced today that its conversational speech recognition system has reached a 5.1% error rate, its lowest so far. This surpasses the 5.9% error rate reached last year by a group of researchers from Microsoft Artificial Intelligence and Research and puts its accuracy on par with professional human transcribers who have advantages like the ability to listen to text several times. Both studies transcribed recordings from the Switchboard corpus, a collection of about 2,400 telephone conversations that have been used by researchers to test speech recognition systems since the early 1990s. The new study was performed by a group of researchers at Microsoft AI and Research with the goal of achieving the same level of accuracy as a group of human transcribers who were able to listen to what they were transcribing several times, access its conversational context and work with other transcribers.

176 comments

  1. Laughable Hype by bwanagary · · Score: 5, Interesting

    On a daily basis in my work environment Microsoft technology is used to a) record voicemail and b) generate text from the speech.  Never, ever, have I received any converted voicemail that wasn't completely unintelligible gibberish.  Seriously.  This is utter nonsense.

    1. Re:Laughable Hype by Anonymous Coward · · Score: 0

      And worse is Cortana. Our company uses a program named "My Files." Every single person I've seen try to run it with Cortana fails since it always thinks everyone says "buy files." Same with sink and stink.

    2. Re: Laughable Hype by Anonymous Coward · · Score: 0

      Did you mean two in the pink and one in the stink?

    3. Re:Laughable Hype by jellomizer · · Score: 0

      I hate to sound like a Microsoft supporter but...
      1) How old is your system? When did you get it installed. How old of the technology is built in. This is with Microsoft right out of the lab technology. So you system is probably using a decade old software.

      2) How good would a transcriptionist handle your voice mails? I get some voice mails that I need to play 3 or for time, just to figure out what the heck it is about. If you try to transcribe what was said out of context most of it is completely unintelligible gibberish. "Hey this is John, the blue light is on (in the background "John tell him the blue light is blinking") and it is blue 83991222 (click)"

      3) How much background noise? Are these from people calling from cell phones. Or a LAN line.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    4. Re:Laughable Hype by jellomizer · · Score: 1

      That isn't the fault of speech recognition, but context recognition.
      If you had a command prompt and say Open My Files it would do the same thing.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    5. Re:Laughable Hype by avandesande · · Score: 4, Funny

      You should start talking with people who don't speak gibberish.

      --
      love is just extroverted narcissism
    6. Re:Laughable Hype by bobstreo · · Score: 3, Insightful

      You should start talking with people who don't speak gibberish.

      Yeah, but Mumbai is on the phone with us again...

    7. Re: Laughable Hype by Anonymous Coward · · Score: 2, Insightful

      We have a up to date Microsoft service doing this at my work. Accuracy is a running joke and I regularly forward people their transcriptions so we all get a good laugh. This might be lab quality recordings with limitations on launguage complexity used to cut down on errors. Error rate of a closed set test isnt really a great indicator. Now a year long comparison against several call centers in multiple industries would be quite compelling.

    8. Re:Laughable Hype by Opportunist · · Score: 1

      Then stop outsourcing to countries where this is the native language.

      --
      We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
    9. Re: Laughable Hype by Anonymous Coward · · Score: 0

      Agreed. TOTALLY laughable hype (sort of like MS proclaiming they were going to cure cancer) and for critical situations (medical data, court transcription, etc.) it's a severely unacceptable risk. Isn't this hype cycle dead yet? Big data was a ginormous and inaccurate joke, this is too. Can we please put our attention back on uses that are actually useful? Algorithms ARE good for some things, they just aren't a panacea (and they never will be). I'm way past the point in my life that the 'magic' impresses me. Like real magic, they are largely parlour tricks in controlled environments. Truly understanding the tech erases hype pretty effectively, and that causes me to question the supposed smarts of a lot of engineers. I suppose it's possible that they are misleading people intentionally, motivated by profit, as well. Sorry, not all of us are that stupid.

    10. Re:Laughable Hype by Luthair · · Score: 2

      No, context recognition would mean the correct word but wrong meaning. Buy and My are clearly distinct words with different pronunciation.

    11. Re:Laughable Hype by Luthair · · Score: 3, Insightful

      3) How much background noise? Are these from people calling from cell phones. Or a LAN line.

      Why does it matter? If it doesn't function in a standard operating environment then it isn't doing as claimed. What would you say to a watch maker who claimed their product was unscratchable but testing consisted of rubbing it with microfibre cloth?

    12. Re: Laughable Hype by Anonymous Coward · · Score: 1

      Are these from people calling from cell phones. Or a LAN line?

      ROFL

      A LAN line?? WTF is a LAN line?

    13. Re: Laughable Hype by Anonymous Coward · · Score: 0

      Hefty claim, I doubt any search engine understands most colloquialisms. Like if someone announces they are pregnant will it translate to zÇZo shÄ"ng guà zÇ

    14. Re:Laughable Hype by pr0fessor · · Score: 3, Insightful

      3.... I've tried various voice recognition software over the years and can say they are getting much better but if there is any background noise forget it.

      I quit trying to use siri because when I get in the car and ask siri for directions if my wife is with me I get siri saying "I couldn't find, 102 why the fuck street don't you type in the address like a regular shut up person damn it.

    15. Re:Laughable Hype by Anonymous Coward · · Score: 0

      Then learn their language.

    16. Re: Laughable Hype by Chaset · · Score: 2

      I just read that as an IP phone connected to the LAN. I have one of those at work. It is theoretically better audio quality than the analog internal phone system it replaced. So cell phone=really bad, LAN line=really good audio quality.

      --
      -- "This world is a comedy to those who think, a tragedy to those who feel."
    17. Re: Laughable Hype by Anonymous Coward · · Score: 0

      Speech recognition I guess... Should have been 'land lines'. And 'salad phones', obviously.

    18. Re:Laughable Hype by skids · · Score: 1

      The missing part in this equation is the quality of the "human transcribers". I worked a few mturk transcription microjobs JOOC a decade or so back. Occasionally the job was to validate another person's transcription. It was rather awful. I don't blame them, though, because the pay is rather awful, too, especially for a job that pretty much monopolizes your attention.

    19. Re:Laughable Hype by rcharbon · · Score: 1

      If there was speech recognition that was 99.9% accurate for me if I were to stuff my leg down my throat first, get me a bone saw.

    20. Re:Laughable Hype by Anonymous Coward · · Score: 1

      Hey, couldn't you wife learn to shut up long enough for you to ask for directions? It would only be fair since you're breaking the male stereotype.

    21. Re:Laughable Hype by arth1 · · Score: 1

      If you had a command prompt and say Open My Files it would do the same thing

      My experience with voice recognition is that if I said "open my files", it would interpret that as "reboot now".

    22. Re: Laughable Hype by arth1 · · Score: 1

      Like if someone announces they are pregnant will it translate to zÃZo shÃ"ng guÃf zÃ

      That's rather embarazado...

    23. Re: Laughable Hype by Anonymous Coward · · Score: 0

      So you mean VOIP.

    24. Re:Laughable Hype by Darinbob · · Score: 1

      I keep some of those emails, the transcriptions are hilarious.

    25. Re:Laughable Hype by Anonymous Coward · · Score: 0

      but then who will do the needful.

    26. Re: Laughable Hype by avandesande · · Score: 1

      I've been getting these voicemail transcriptions from Vonage for years, and although it screws up a lot I get the gist of what the person is saying which is enough. It's certainly better than having to call into a voicemail system.

      --
      love is just extroverted narcissism
    27. Re: Laughable Hype by thewolfkin · · Score: 1

      Are these from people calling from cell phones. Or a LAN line?

      ROFL

      A LAN line?? WTF is a LAN line?

      or maybe a typographical error that was supposed to say LAND line. I often type other words from what I think and his fingers just auto put LAN instead of land. Or maybe he's young enough to not remember that it's "landline" and after hearing so many people talk about it he assumed the term was lanline. either way hardly a ROFL moment.

      --
      Just another second banana
    28. Re:Laughable Hype by Threni · · Score: 1

      To be fair, they were comparing the quality of microsoft's translation with the sort of quality you get from the sort of Indian who phones you up at home when you're having your food to tell you that "your computer is having virus and internet is being attacked with hacker but for you sir because i work for microsoft and can do the needful i can remove hacker for only $699.99"

    29. Re:Laughable Hype by swillden · · Score: 1

      Or a LAN line.

      You mean "land line". The terminology predates IP telephony, and in any case the term clearly cannot apply to local area networks.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    30. Re:Laughable Hype by angel'o'sphere · · Score: 1

      Actually it would give an error: 'My' - file not found. 'Files' - file not found.

      Yes, macOS/OS X has an "open" command. And "open 'My Files'" would have worked just fine, supposed you had a folder or file called 'My Files'.

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    31. Re: Laughable Hype by Anonymous Coward · · Score: 0

      I see what you did there. ;)

    32. Re:Laughable Hype by Luthair · · Score: 1

      You replied to the wrong person.... I assumed he meant VOIP personally.

    33. Re:Laughable Hype by swillden · · Score: 1

      I did reply to the wrong person. I'm surprised I did that. In any case, it would make no sense to call VOIP a "LAN line".

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    34. Re:Laughable Hype by Luthair · · Score: 1

      Why? If you have a SIP phone you plug your LAN into it /shrug.

    35. Re:Laughable Hype by Anonymous Coward · · Score: 0

      I've tried to view Coursera lessons with English captions on. These were supposedly transcribed by humans, but they read like automatic Youtube captions. The transcribers cannot possibly have understood what the video was about. They just tried to come up with some words to match the sounds that they heard. Of course they didn't bother to look up technical terms or people's names either.

    36. Re:Laughable Hype by Anonymous Coward · · Score: 0

      Or it would open your fly.

  2. Errors are not Errors by idji · · Score: 5, Insightful

    When a human transcriptionist makes a mistake you can usually work out what they meant. When Speech-to-text (STT) makes a mistake it is often gibberish. So objectively it is "better" at transcribing, but subjectively much worse.

    1. Re:Errors are not Errors by Anonymous Coward · · Score: 0

      But STT is thousands of times faster and cheaper. So that's a no-brainer. If you want, hire a team of Harvard grads to pore over your recordings and author definitive, 100% error-free guaranteed transcriptions. Hey, it's going to cost $700 per minute but at least there will be no errors!

    2. Re:Errors are not Errors by AmiMoJo · · Score: 4, Interesting

      Not any more. One of the ways that they got the accuracy up so high is by giving the machine an understanding of English and common phrases, similar to what a human has. It's been used for input correction on smartphones for a while too, e.g. with the Google keyboard it can correct the previous word based on the next one you type if it realizes that they don't make sense together.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    3. Re:Errors are not Errors by Anonymous Coward · · Score: 1

      But you will know that there is an error. With a professional person, they will make the whole sentence look just fine... and you will never realise there was an issue, which is much worse IMHO...

    4. Re:Errors are not Errors by jellomizer · · Score: 4, Informative

      Normally we have transcriptionist who are trained in a particular area to understand the context of the message. A legal transcriptionist requires different training then a Medical Transcriptionist.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    5. Re:Errors are not Errors by JasterBobaMereel · · Score: 1

      Unless it actually understands what is being said then it will always make mistakes that result in gibberish

      If they are saying that they have cracked this, then they have strong AI, and should be announcing it to the worlds press ... (they haven't)

      They have added some syntax and grammar rules... just like everybody else ...

      --
      Puteulanus fenestra mortis
    6. Re:Errors are not Errors by AmiMoJo · · Score: 4, Interesting

      It's more than just syntax and grammar rules. For example, Google has been mining the web for that kind of knowledge. You can see it in Google Translate sometimes. It generates suggestions for your input, and sometimes screws up like thinking "alot" is a word. It also uses colloquialisms in its output, which again it gathered from analysis of the web and which doesn't fit standard grammar or syntax rules.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    7. Re:Errors are not Errors by Anonymous Coward · · Score: 0

      Like when someone says "Microsoft successfully did a good thing" you write down "NO! M$ BAD! Must move goalposts!"

    8. Re:Errors are not Errors by K.+S.+Kyosuke · · Score: 3, Insightful

      Hey, it's going to cost $700 per minute but at least there will be no errors!

      So it's about three times cheaper than the lawyer that you'd need if you get sued for a bad transcription?

      --
      Ezekiel 23:20
    9. Re:Errors are not Errors by Billly+Gates · · Score: 1

      Just keep it recorded and have a human review it.

      This could cut costs greatly with this automation if it is true. Why pay 50 transcribers when you can pay 1 for a reduced wage since demand will now be lower and have the computer do the work for free?

    10. Re:Errors are not Errors by gnick · · Score: 4, Insightful

      A legal transcriptionist requires different training then a Medical Transcriptionist.

      And sometimes even that training falls short. Does anyone remember the explosion at WIPP when the tech transcribed "an organic kitty litter" instead of "inorganic kitty litter"?
      Kitty litter explosion.

      --
      He's getting rather old, but he's a good mouse.
    11. Re:Errors are not Errors by Dripdry · · Score: 1

      No amount of money can ameliorate sending your mother-in-law,"Ask her why her penis caught in her dress" instead of "Ask her why her pin is stuck in her dress"

      None.

      Ever.

      --
      -
    12. Re:Errors are not Errors by Opportunist · · Score: 1

      Who is moving goalpost. The goalpost is "write what I said", and that didn't move an inch.

      --
      We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
    13. Re:Errors are not Errors by Anonymous Coward · · Score: 0

      The goalpost, being the Title and TFS, is "as accurate as professional transcribers".

      It was moved to where the similar 5.1% error rate doesn't really count, and now you've just moved it again to now need to be a 0% error rate.

    14. Re:Errors are not Errors by Anonymous Coward · · Score: 0

      But STT is thousands of times faster and cheaper. So that's a no-brainer. If you want, hire a team of Harvard grads to pore over your recordings and author definitive, 100% error-free guaranteed transcriptions. Hey, it's going to cost $700 per minute but at least there will be no errors!

      Why does it have to be Harvard? And what make you think Harvard team will be guaranteed 100% error-free? Speaking of gibberish...

    15. Re:Errors are not Errors by hord · · Score: 2

      The way the machine learning databases are built, it does understand what is being said. That's why it is so effective. This happens through the connections that are built inside the neural network along with the architecture of the network itself. They are now using context-sensitive data labeling to assign specific meaning to words that are generally ambiguous based on the text around these words. The neural net can learn over time which combinations of words are likely to fall within specific categories and use this as a basis for translation later on.

      There are several teams working on this and they have been publishing papers for a while now. The press is probably just picking up on it. Google's translation service is said to have increased in accuracy due to the same general principles. Facebook is looking into it for better social data aggregating. Various academic teams are doing it for prestige and to add to the field. Lots of good work here, actually although there is still far, far to go.

    16. Re:Errors are not Errors by hord · · Score: 3, Interesting

      I'm not a statistician but it's possible that once you can prove that the neural network can produce answers at a success rate higher than humans you would be introducing error by allowing humans to review it. I'm not saying it shouldn't be done but this is one of the weird questions that people will have to ask on a case-by-case basis as these technologies are applied to real problems.

    17. Re:Errors are not Errors by 91degrees · · Score: 1

      It would be nice if there were some examples so we could compare for ourselves. If we're looking at occasionally picking the wrong homophone, it's a lot better than getting entire sentences mangled.

    18. Re:Errors are not Errors by SeattleLawGuy · · Score: 2

      Hey, it's going to cost $700 per minute but at least there will be no errors!

      So it's about three times cheaper than the lawyer that you'd need if you get sued for a bad transcription?

      This will eventually bring down the costs of lawsuits by making court reporters less common, but that may take a few decades.

      Not many lawyers are $700 per minute. Even $700 per hour is rare.

      And do you know how much we have to pay to go through law school and have our senses of humor surgically removed?

      --
      Real lawyers write in C++
    19. Re:Errors are not Errors by Chaset · · Score: 1

      A few years ago, a colleague of mine and I were working in Japan. He was writing up a request for a quote and ran it through Google Translate to check his Japanese; expecting to get back an English phrase that at least vaguely corresponded to what he wanted to convey. All I remember was that the output contained the phrase "stormy bedroom". I had no idea how that came from his original text. Anyways, I told him to forget using Google translate.

      --
      -- "This world is a comedy to those who think, a tragedy to those who feel."
    20. Re:Errors are not Errors by Anonymous Coward · · Score: 0

      Understand tends to imply an ability to reason about or explain the relationship, but rule extraction from neural networks is generally difficult. So understand seems too strong a word for something that might be an indication of correlation rather than understanding

    21. Re:Errors are not Errors by arth1 · · Score: 1

      It's more than just syntax and grammar rules. For example, Google has been mining the web for that kind of knowledge. You can see it in Google Translate sometimes. It generates suggestions for your input, and sometimes screws up like thinking "alot" is a word. It also uses colloquialisms in its output, which again it gathered from analysis of the web and which doesn't fit standard grammar or syntax rules.

      Google Translate relies on community suggestions and validation. See https://translate.google.com/c...
      The problem is that not everyone who joins there are truly fluent in both languages, nor all that literate.

    22. Re:Errors are not Errors by Anonymous Coward · · Score: 0

      I posted this elsewhere on the thread since I do work in the field, but since you're clearly too much of an idiot to read down there, here you go:

      I can type faster than you can talk -- I also sometimes do real-time transcription. I often accelerate the playback with slow speakers or people who pause for dramatic effect. So if I have to listen through anyway to manually verify and correct the transcription, it won't save any time at all. It will actually slow me down because I'll have to pause for every correction or to format things correctly. It's faster to type it than it is to cursor around the text to find the point to edit.

      How will your automation + proofreading allow for real-time transcription? At the end of 5 hours of evidence I can provide each lawyer a CD of the *certified* transcript so they can use it that night to prepare for the next day's proceedings. You're also not going to get around the "highly paid" part simply because of the importance of accurate transcripts. If you've got 8 lawyers and a judge arguing about the transcript at a collective $1500/hour billed to the government plus the administrative overhead of the courtroom, saving $20,000/year on transcription costs is pretty meaningless. I would also be subpoenaed and questioned about any significant mistakes I signed off on...not a job you're going to give a $30,000/year secretary.

    23. Re:Errors are not Errors by djinn6 · · Score: 3, Interesting

      The way the machine learning databases are built, it does understand what is being said.

      I think the word "understand" has a more general meaning than what you wrote later on. For it to understand what was being said, beyond making grammatical sense of the sentence, it needs to know the abstract concepts behind the words and be able to manipulate them.

      For example:

      Jeff is a software engineer, Kate is a software engineer, and Larry is also ...

      Can you finish the sentence?

      Most humans could do it with a high degree of accuracy. Some might even find the obvious answer so boring that they try for a more creative one. However, ML is still very far from that.

      Since it does not grasp the abstract concepts, its transcription is much more likely to lose meaning than a human transcriber. When talking about network technology for example, a human will not mis-transcribe "NAT" to "gnat", while a machine will.

    24. Re:Errors are not Errors by Anonymous Coward · · Score: 1

      I'd love to see that AI. Part of the job is not only identifying who said what, but slightly editing the content based on the individual. Witnesses are exactly verbatim, lawyers have some of the ums, uhs, and stuttering cleaned up a bit, "mm-hmm" can become "yes", etc. Judges speak The Queens English at all times.

      They're not supposed to, but people speak over one and other all the time too. Part of the transcriptionists job is to be in court, order people to repeat things that get garbled by talking over each other, stand closer to microphones, etc. The transcriptionist, while, in court, also has to get all the proper name spellings and match up voice to speaker. Let's see the AI do that.

      It would be easier to make an AI to replace the lawyers and judges than it would be to make one to replace the transcriptionist or court reporter.

    25. Re:Errors are not Errors by VeryFluffyBunny · · Score: 1

      Re: "grammar and syntax rules" -- Are you sure these words mean what you think they mean? And whose standard are you referring to?

      BTW, languages are complex probabilistic systems and so rules are unlikely to describe common usage with sufficient accuracy to be meaningful. Yes, there are large tomes of grammar rules for sticklers to memorise and pontificate about but they only describe the lesser used fringes of formal language and writing. The vast majority of language usage that remains cannot be described as "rule governed."

      --
      Debate is a form of harassment. Do not question my truth.
    26. Re: Errors are not Errors by Anonymous Coward · · Score: 0

      But hey, you can scare the bejesus Out of laymen With ridiculous Claims about your AI.

      MARVIN MINSKY and lately musk Like to do this. The evil Money elite's pleasure in scaring sheeple...

    27. Re:Errors are not Errors by thegarbz · · Score: 1

      Generally if you crowd source this you end up with a pretty good result. You don't need anywhere near "everyone" to make it work.

    28. Re:Errors are not Errors by Anonymous Coward · · Score: 0

      Yes, and the result is that I have to go back again, delete the gibberish again, type the word letter by letter and hope Google doesn't change it back as soon as I look away. Seriously, anything left of the current word should be off limits for autocorrect because I've already looked at it.

  3. So shitty source... Did both use the same filters? by Anonymous Coward · · Score: 0

    Or did the software get a filtered recording (or signal processes it itself) and the humans an unfiltered noisy static recording?

    How about we try with high quality modern voice recordings instead of 20+yr old telephone line recording?

  4. Using it to post on slashdot by Harald+Paulsen · · Score: 4, Funny

    holyfield is these all of this was made worse by the fact that i had these birds skilled estimate uh... supplying itself what's your special prom to prevent fraud reform
    thoughtfulness julia roberts police comments entry drug connections predicting that nighttime beating

    --
    Harald
    1. Re: Using it to post on slashdot by Anonymous Coward · · Score: 0

      Ah, this must be what BeauHD uses.

    2. Re:Using it to post on slashdot by Anonymous Coward · · Score: 0

      you're a fucking idiot.

    3. Re:Using it to post on slashdot by Anonymous Coward · · Score: 0

      Jeesh exactly~
            I know what you mean brother- tell it!! If it were not for "these birds skilled estimate" I don't know where I'd be in life.

    4. Re:Using it to post on slashdot by Anonymous Coward · · Score: 0

      This, sadly, makes more sense than most Slashdot posts!

    5. Re:Using it to post on slashdot by Optic7 · · Score: 1

      Bastard! My head almost exploded from trying to hold back tears of laughter at work! 10 out of 5 funny, because it's exactly like the transcriptions I get from Google Voice.

  5. IT'S A TRAP! by Anonymous Coward · · Score: 0

    Actually speech goes to the cloud and is translated by lots of MSCEs they'd otherwise lay off!

  6. Bad experiences on this front by CustomSolvers2 · · Score: 4, Interesting

    Some months ago, I did some tests with speech recognition software and my conclusion was that it is still too unreliable. My intention was to develop an application allowing me to write moderately complex code by voice (creating files and folders, including proper indentation, recognising functions, variables and other basic elements, etc. Basically, allowing me to write/edit the main parts of a random algorithm in certain language without touching the keyboard). I did test Microsoft in-built functionality (+ used one of Microsoft's .NET programming languages) and it wasn't even close to what "5.9% error rate" seems to indicate (almost perfect?).

    In defence of the software, I have to say that my English accent isn't precisely excellent (some people say that it is "too thick" and other people just say "what?". LOL) and honestly I make a very little effort to pronounce properly. But this is also the problem with speech recognition: it is mostly focused on a specific language/accent/intonation. I was doing my tests in an English Windows version and this was the language for the default speech recognition (and adding a different one wasn't precisely straightforward).

    I do perfectly understand the complexity associated with developing a reliable enough piece of software delivering what I was expecting; but this is precisely the reason why I looked for existing solutions rather than developing everything myself (what I do pretty often). In any case, my impression is that you can still not expect good enough reliability of (Microsoft's) speech recognition software, much less when mixing languages/accents up (particularly problematic situation: including Spanish words when talking in English). I might give a new shot at all this next year though.

    --
    Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    1. Re:Bad experiences on this front by peragrin · · Score: 1

      The recognition system is 5.9% accurate for the testers. For the rest of us it is far far off. Human testers are 5.9% accurate across a much larger selection of people.

      I can't use voice recognition to send a text without 3-5 attempts. And I don't have a hard accent.

      --
      i thought once I was found, but it was only a dream.
    2. Re:Bad experiences on this front by Anonymous Coward · · Score: 0

      a) This is about real human launguages, not programming languages
      b) You wern't using the speech recognition software that this is talking about

    3. Re:Bad experiences on this front by Baron_Yam · · Score: 3, Interesting

      5.9% means it still gets more than 1 in 20 things wrong. That's a LOT when you're feeding the information into a system that requires pretty much a 0% error rate.

      Second, there's a huge difference between standard language and specialist syntax. With programming, you're likely going to want a LOT of special formatting that you can type without thinking but it's cumbersome to communicate via speech in a way that won't confuse a speech recognition engine.

      And finally - so long as they don't have a related disability - a proficient typist can already type about as fast as they can form decent code in their head. With a bit of 'mousework' for selection and cut-and-paste I don't see speech ever becoming the superior entry method unless and until we have genuine AI that understands your intent rather than your words.

      It might be nice to use speech as a macro-invoker, though.

    4. Re:Bad experiences on this front by jellomizer · · Score: 1

      Writing code by voice? Are you insane.
      Speech is portraying ideas in a liner fashion. Coding you are jumping up and down filling different parts of the problem. At different time.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    5. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      a) This is about real human launguages, not programming languages

      ??!!! I will repeat it again by using as simple words as I can. This will be the last time I will do it (+ most likely last time I will be talking to you if you have been logged in). Imagine that I want to create the following code (C# or Java):

      void FunctionTest (int argumento)
      {
      }

      This is code right? That code is formed by words (function/variable names). Rather than typing that words, I developed an application (I wrote code in a .NET programming language, because this is what you do to create an application, + relied on the in-built Microsoft speech-recognition engine) writing that code for me just from my voice. For example, rather than typing "void", I was saying "void" (+ my application was writing "void"). "Argumento" is a Spanish word, so when I was saying it that software recognised "argument" (the English equivalent). Have you now understood everything properly? If yes, please don't bother me anymore; if not, please, don't bother me anymore. Bye.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    6. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      Thanks. It is kind of nice to know that I am not the (whole) problem :)

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    7. Re:Bad experiences on this front by Anonymous Coward · · Score: 0

      Shouldn't compare off the shelf speech-to-text to a cutting edge AI research project by one of the biggest players in the software industry. That's dumb.

    8. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      5.9% means it still gets more than 1 in 20 things wrong

      Thanks for the mathematical lesson, but I kind of knew that :). What I meant was that my overall experience was way much worse than 1 in 20; it was almost 1 in 2. When using the English version for proper/in-dictionary English words, it performed kind of OK (1 in 5/10? when using simple words; much worse with complex words). But the biggest problem was non-existent/other-language words; for example: "var1" or "thatfunction", its performance on that front was horrible and this was what made me quit that development. It was plainly unable of recognising random variable names.

      Second, there's a huge difference between standard language and specialist syntax. With programming, you're likely going to want a LOT of special formatting that you can type without thinking but it's cumbersome to communicate via speech in a way that won't confuse a speech recognition engine.

      No. This wasn't the problem. I developed it such that I was merely inputting single (pretty simple) words. The biggest problem (on top of my accent) was the aforementioned non-existing words. I was able to do pretty much everything I wanted without problems, except including/editing random words (variable names).

      And finally - so long as they don't have a related disability - a proficient typist can already type about as fast as they can form decent code in their head

      The final goal wasn't to write quicker code or to cover any inability, but to make my programming experience more comfortable (+ giving a shot at something I wasn't too experienced in; I do this kind of things quite often). Plainly saying "open project X", "edit file Y", "change var1 to var2" with my hands in my pockets :)

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    9. Re:Bad experiences on this front by Anonymous Coward · · Score: 1, Insightful

      Words don't make a language, and C does not become English just by using some English words.
      Doing what you want is a completely different thing and would use a completely different algorithm, so at the very least it as rather off-topic to this article (mostly because things like phrases, grammar, context in general etc. don't apply, but are very important to creating a good natural language recognition).
      You are being rather arrogant about it considering you very much didn't seem to understand the poster or why his criticism is valid.

    10. Re:Bad experiences on this front by Anonymous Coward · · Score: 0

      ... a proficient typist ... With a bit of 'mousework' for selection and cut-and-paste ...

      Ctrl+C=copy, Ctrl+V=paste, Ctrl+X=cut. Ctrl+A=select all, for other selections: arrow keys while holding shift down.

      Anyhow, that (or something similar) is how a properly designed user interface could do it. Windows does ever since version 3.whatever. Linux world is a little bit more complex since different applications and different window managers may have their own preferences, but still usable. However what is close to application hell is an application that requires at least some keyboard input, but does not have keyboard shortcuts for all its commands - instead relying on pointing an clicking. Input switching between keyboard and mouse is slowing one down a lot - similar to context switching in an open plan office environment.

    11. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      Modesty apart, I am kind of good at developing data parsing/management algorithms completely from scratch. I was also developing that tool for my personal use, not for the general public/any situation. I was working on it just during some days and, when quitting it (as commented above, because of the underlying speech-recognition failed a lot when trying to recognise variable names), I had a reasonably good approach in place.

      It was able to open the file I wanted, insert/edit specific parts in the right location (well... kind of, as far as it wasn't able to recognise the function/variable names), to change indentation and even to modify classifications (class/namespace). It was focused on a specific programming language and only at the file level. I was also planning to not care about complex programming-language features; just about scope, variables, methods, conditions, loops, etc.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    12. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      Doing what you want is a completely different thing and would use a completely different algorithm, so at the very least it as rather off-topic to this article

      How is this true? I want my computer to write something from my speech and this is precisely what this is about. Imagine that I want to search for a specific brand with no meaning in English. This is also speech which is assumed to be recognised (at least, this was my approach). For example, Microsoft doesn't mean anything in English, but this is a quite commonly-used term.

      You are being rather arrogant about it considering you very much didn't seem to understand the poster or why his criticism is valid.

      I am not arrogant and none of my actions can reflect what I am not. I am plainly trying to be extremely clear (perhaps, aggressively clear) regarding my zero interest in dealing with people intending to have misunderstanding-, unmotivatedly-wasting-my-time and/or aggressive- (ironic, don't you think? LOL) prone attitudes. In any case and as explained above, I don't think that this criticism is valid. I clarified in my first comment that I do perfectly understand the complexity involved, but, as a final user of a speech-recognition piece of software , I don't care about that differentiation.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    13. Re:Bad experiences on this front by CustomSolvers2 · · Score: 2

      I was sharing my personal experience on this front, not implying that the outputs of this research has anything to do with current commercial accuracy. I personally found kind of surprising the high number of errors (not too much into voice-based anything, but from what I see and read everywhere I was kind of expecting something different) and merely posted about that experience.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    14. Re:Bad experiences on this front by ranton · · Score: 2

      I can't use voice recognition to send a text without 3-5 attempts. And I don't have a hard accent.

      It is very odd that you have such a low success rate with voice recognition. At least 2/3 of my voice texts can be sent without editing, and most of the errors have to do with proper names. Are you sure you don't have an accent? My wife mumbles pretty bad when talking fast (so bad I don't like talking with her on the phone most of the time) but even she has a pretty easy job using voice to text now. It was pretty bad a few years ago but it really is amazing how much better it has become.

      --
      -- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
    15. Re:Bad experiences on this front by ziggystarsky · · Score: 2

      The reported error rate is for conversational English. This means that you cannot throw meaningless words at it. Modern speech recognition exploits grammatical and semantical structure. The stock recognizers can't do this for programming languages. You could train the model on a programming language, and certain constructs (like brackets, if-then-else) will see an improvement in recognition.

    16. Re:Bad experiences on this front by skovnymfe · · Score: 1

      Do you have access to internal MS Research software? Cool, bro. Can you hook me up with some access too? Because you must've used the internal MS Research software to do your anecdotal testing some months ago, since you've got an opinion on how good it is at doing its job.

    17. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      This means that you cannot throw meaningless words at it

      This is precisely the reason why I stopped that development. It wasn't able to properly recognise a very important aspect of programming: random (variable) names.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    18. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      Cool, bro. Can you hook me up with some access too?

      No. LOL.

      Because you must've used the internal MS Research software to do your anecdotal testing some months ago, since you've got an opinion on how good it is at doing its job.

      As already explained to other poster with an equivalent (mis-)understanding, I was plainly sharing a relevant recent experience on this front to help people not too used to all this (e.g., myself 1 year ago) to get an idea about the current commercial reality (= way off 5%). You don't consider it relevant? Excellent! Ignore it. But, please, don't invent meanings or intentions which don't exist.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    19. Re:Bad experiences on this front by parkinglot777 · · Score: 1

      a) This is about real human launguages [sic], not programming languages b) You wern't [sic] using the speech recognition software that this is talking about

      I would agree with point "a" but I would agree with point "b" only under certain conditions.

      If a software is specifically for speech and has built-in functionalities in attempt to auto-correct words to fit a context sentence, then I agree. However, if it simply transcribes a speech, then it can be used to do anything and does not need to be related to just a conversation recognition.

    20. Re:Bad experiences on this front by SQLGuru · · Score: 1

      All of the Speech Recognition software that you commonly use is geared toward conversational language. You could create one that follows the language and grammar of code, but it would require different training. Consider the search suggestions you get when you type in the search bar.....that's how Speech Recognition works. Based on the previous words, it creates a list of likely next words and then determines which one matches the spoken words. When I type "void" into Google, it suggests to me the following:
      void
      void(0)
      void movie
      void definition

      None of those suggestions are "void function". And Google Suggestions aren't trained for more normal language like Speech Recognition would be because people are less likely to search using full sentences.

      What you are attempting to do is technologically possible, but you'd need to use the Speech API and create your own trainings (try this article and focus on the Grammer Building: http://www.c-sharpcorner.com/u...).

      Regardless, if you are using that particular Speed API, I don't think you are using the one in the article. I think the one the article is measuring is the one that would be found in Azure (https://azure.microsoft.com/en-us/services/cognitive-services/speech/).

    21. Re:Bad experiences on this front by GrumpySteen · · Score: 1

      With programming, you're likely going to want a LOT of special formatting that you can type without thinking but it's cumbersome to communicate via speech in a way that won't confuse a speech recognition engine.

      This story is about speech recognition being as good as transcription services. Programmers don't dictate their code verbally to be transcribed into text format by someone else, so that is a really weird thing to try to use as a counter argument.

    22. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1
      As explained to others, the problem I found cannot be solved via training: random variable names. Anything can be a variable name. I might not even have minded to spell it out, but that option wasn't available either (suggestion for those working on this front: why not including such an option?! It should be easy to implement and this kind of resources might be very helpful for weird-word scenarios or specific implementations like mine; or at least, a way to disable the language-dictionary searches and to focus on phonetic analysis).

      What you are attempting to do is technologically possible

      Sure it is possible, why would I have started to work on it otherwise? (I am an experienced programmer and, on top of everything, a sensible person with no interest in wasting my time for no reason). You (like others before too) are coming from a set of wrong assumptions regarding that specific development and my expectations: all what I wanted was a sub-system accurately recognising certain individual words, nothing else. I was the one meant to take care of all the context, different meanings, actions, etc. completely by my own. But the in-built speech recognition engine didn't fulfil that expectation.

      I don't think you are using the one in the article

      I worked on this around January/February and used the in-built speech-recognition engine of Windows/.NET. So, I was certainly not using what is being discussed in this article; and I haven't insinuated otherwise at any point.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    23. Re:Bad experiences on this front by Baron_Yam · · Score: 1

      >Programmers don't dictate their code verbally to be transcribed into text format by someone else, so that is a really weird thing to try to use as a counter argument

      Yet my post was in response to someone attempting to program by dictation, so somehow it seems completely relevant.

    24. Re:Bad experiences on this front by arth1 · · Score: 1

      I can't use voice recognition to send a text without 3-5 attempts. And I don't have a hard accent.

      I can't get voice controlled phone systems to work.
      The main problem is that I have a deep voice, and these systems are built on the pareto principle - cutting off the 20% with the deepest or highest voices is considered acceptable. I refuse to squeak to be understood.
      Some of the phone systems have hardcoded that if you say "human" or "operator", it will take you to a human operator. The problem is that it doesn't recognize those keywords either. After the aggressive high pass filter on the voice recognition systems, I probably sound like I'm speaking underwater.

      An additional problem is that some languages do not correspond well to English, which the systems were designed for. Languages that depend heavily on inflection or have a great disparity between written and oral language are heavily penalized.

    25. Re:Bad experiences on this front by SQLGuru · · Score: 1

      Did you build your own grammar? And variables needn't be a bunch of cryptic jibberish -- use meaningful variable names so your code is more readable; so real English words.

    26. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      Did you build your own grammar?

      I created that algorithm as a text parser/writer (+ communicating with the compiler) and relied on speech recognition only accessorily. Why spending time on letting the algorithm understand a context when it isn't required (+ would have increased the input complexity a lot)? This situation isn't equivalent to a conventional language full-text (with context, double-meaning, intention, etc.) understanding where the user has no participation, but to a mouse-pointer+keyboard emulator.

      After starting, the application was kept in the background waiting for my commands. All the commands consisted in simple words and short sentences telling the application exactly what to do without it needing to understand previous/next order. For example: "open project" (small pause) "1", "open file" (small pause) "1", "scope" (small pause) "global", "go line" (small pause) "123", "insert method" (small pause) "method1" (small pause) "arg1" (small pause) "arg2", etc.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    27. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      use meaningful variable names so your code is more readable

      I always write pretty meaningful English variable/method names in my code, but what is evident for a person is nonsense for a computer. For example, "variable1" (or worse, the more common "var1") is very difficult for a speech-recognition engine. And this was, as explained, precisely the main problem with that development and what made me stop it. Bear in mind that I was expecting this application to help me in my normal work, not to make it change the way in which I work (= redo all my variable-naming conventions).

      I don't want to be rude, but I think that I have already spent long enough time explaining you what, honestly, I think that was pretty much implicit in my original post (at least, for someone with a reasonable good understanding about all this, precisely the only kind of audience who should be here commenting anyway). So, I hope that I have solved all your doubts and you don't need to ask me anything else.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    28. Re: Bad experiences on this front by Anonymous Coward · · Score: 0

      The thing is, speech recognition has been moving more and more towards cheating via rules like grammar because processing sound is too difficult, so if you use it for code it will be a lot worse because all the cheats break.
      And the problem is more difficult anyway. Try to dictate your program to another programmer: in my experience that never worked well. Now try that with a random person on the street (which is a more fair comparison if you don't use a special code-oriented speech recognition) and you'll probably suddenly feel much better about what the computer managed to do.
      And to me "using simple words" sounded like a very strong arrogance red-flag, but whatever...

    29. Re: Bad experiences on this front by CustomSolvers2 · · Score: 1

      moving more and more towards cheating via rules like grammar because processing sound is too difficult

      This seems like an spot-on summary of what I am starting to see that is the situation in this sub-field. I was expecting tools able to recognise random words by focusing on the sound aspect though.

      And the problem is more difficult anyway. Try to dictate your program to another programmer: in my experience that never worked well.

      If you have a proper underlying data-structure/-understanding in place there shouldn't be any problem on this front; at least, not for me as far as I developed this mostly for my personal use. I had still to think about good ways to pass reasonably long entities (e.g., a function with many arguments of different types), but I was quite happy with its performance during my preliminary tests. If random words (= variable names) were recognised properly, I would have certainly completed a first version for me to use in my work.

      And to me "using simple words" sounded like a very strong arrogance red-flag, but whatever..

      You misunderstood it. It was just an aggressive resource to minimise the chances of starting a long set of misunderstandings to nowhere. Although perhaps I did misjudge the situation and relied on it without being required.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    30. Re:Bad experiences on this front by angel'o'sphere · · Score: 1

      Speech is 4 or 5 times slower than typing.
      So unless you can tell an IDE "look at package 'my.product.model' and 'my.product.entities', create a Factory based on ctor signatures for all 'entities' that implement interfaces from 'models' and return 'model' classes" voice input is pretty pointless. And I doubt an 'AI' will be able to do that soon, while my template based code generator does that instantly. But I start it with a mouse click (which is slower than a key board short cut, obviously).

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    31. Re:Bad experiences on this front by angel'o'sphere · · Score: 1

      Coding you are jumping up and down filling different parts of the problem
      Erm, if you meant me with "you", then err, no!!
      I just write my code top down.

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    32. Re:Bad experiences on this front by angel'o'sphere · · Score: 1

      The parent was just bad in natural language transcription into internal (mind) symbols and constructed a completely different meaning from your words than you intended.

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    33. Re:Bad experiences on this front by angel'o'sphere · · Score: 1

      You are obviously not a programmer.

      'invoiceAddress', 'invoiceName', or other artificial words are used in programming.

      The speech recognition would interpret it as 'invoice address' and 'invoice name', hence the program would be 'broken'.

      Other examples are abbreviations, like fis (FileInputStream) for a variable name or fos (FileOutputStream). However I would assume a speech recognition software would be able to understand eF Eye eS.

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    34. Re:Bad experiences on this front by Anonymous Coward · · Score: 0

      My intention was to develop an application allowing me to write moderately complex code by voice (creating files and folders, including proper indentation, recognising functions, variables and other basic elements, etc. Basically, allowing me to write/edit the main parts of a random algorithm in certain language without touching the keyboard).

      If you're still doing manual formatting/indentation in 2017, then you're doing it wrong, and you need to go back to school for a refresher.
      (Hint: Programs like indent have been around for decades.)

    35. Re:Bad experiences on this front by Anonymous Coward · · Score: 0

      Speech is 4 or 5 times slower than typing.

      To be fair, that's still significantly faster than entering text via smart-phone keypad. Maybe that's why all the young whipper snappers that don't know how to type on a real keyboard dream of speech interfaces. ;)

    36. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      If you're still doing manual formatting/indentation in 2017, then you're doing it wrong, and you need to go back to school for a refresher. (Hint: Programs like indent have been around for decades.)

      If you take random words completely out of context and make up their meanings/my true intention, you are plainly having a discussion with yourself rather than with me. If you had read the remaining parts of that post you would have understood that I developed a small application to write code with my voice. For example, I say "insert void" (pause) "method1" and, in the corresponding file, it writes

      void method1()
      {
      }

      If I want to change the indentation (or the scope of the method is different and the default indentation too), I would use the corresponding command such that the application can perform the corresponding action (changing the indentation); FYI, that implementation is extremely straightforward (i.e., the difference between developing an application writing "{" or " {" is negligible). Do you see now the difference between using existing software to code (e.g., any IDE or editor since quite a few years ago) and developing a piece of software whose output is properly-formatted code; what my comment was about: the proper context which would have allowed you to have a conversation with me rather than with yourself?

      Seriously, what is the problem of people with extremely-poor-understanding capabilities and their obsession of arbitrarily wasting my time with their nonsense? Aren't there better ways to spend time online than bothering me with their ridiculous expectations and concerns? Or why aren't they making a tiny effort to properly understand (even from their past huge amount of errors) such that they can have a slightly relevant contribution to someone else other than arbitrarily wasting time?

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    37. Re:Bad experiences on this front by SQLGuru · · Score: 1

      Actually, I am a programmer. And it seems like he never bothered to give the speech engine the grammar of the programming language (the Backus-Naur form syntax). Programming languages are very prescriptive in where things go -- unlike English where some words can vary in their location.

      With the BNF defined, it should be easier to determine that "variable goes here" and the software could look for previously identified variable tokens to assist in the interpretation --- that's basically how IntelliSense works. So, you might need to spell it the first time, but subsequent times, it should be able to identify variables based on a token table.

      He mentions in a different post that he didn't bother with the grammar because it was just supposed to be command based (open project, save project type commands) --- which would have benefited from defining a grammar because that's EXACTLY WHAT IT'S FOR. Using the grammar to define a programming language is actually more complex, but certainly do-able. [In another post, he mentions he was trying to actually dictate code....so who knows what he was actually doing.]

      What it seems to me is that he was in over his head with a cool idea and an idea of how to start it. But speech recognition (even with the APIs available to us) is hard and still isn't as accurate as it needs to be to not be frustrating.

    38. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      [In another post, he mentions he was trying to actually dictate code....so who knows what he was actually doing.]

      What about reading the numerous explanations which I wrote for you (because you needed much more than all the other commentators here to understand extremely simple ideas) and/or asking me? I have already wasted a ton of time with you and you don't seem to get even the basic ideas right, I wouldn't have minded to answer some TO-THE-POINT questions (already addressed in my other comments but which you weren't able to understand). In the future, please, try to avoid talking about me (actually lying as far as you transmitted an impression which has nothing to do with the reality as defined in the other comments) when I am not present.

      Although you don't seem to be able to understand even simple concepts (I wrote a step by step example of what the application is doing and you are still not getting it?!), here it goes my last attempt at trying to go through that thick layer of poor understanding that seems to be around you, such that you can hopefully help 1 SINGLE IDEA RIGHT. Forget about grammar and all the rules which you apply to speech recognition in other contexts; forget about all what involves the syntax of a programming language; just assume English grammar by default and focus on answering this single question: how are you expecting me to create a set of grammar rules (or improvements on the default ones) allowing my application to recognise the extremely variable reality of variable names, as defined by examples like: "var1", "functionNew", "varString", "parseStringFromNumber"; that is: names with evident-for-people English meaning, but not expressly recognised in any dictionary. I want all of them to be recognised as single words, not as sentences ("var1" as "var" & "1" is wrong). Please, illustrate me and everyone else here: how can I use the in-built .NET speech-recognition engine to understand this kind of terms?

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    39. Re:Bad experiences on this front by Anonymous Coward · · Score: 0

      Speech-To-Code doesn't sound very fun to me... Ctrl+Alt+L, Up, Right, Enter, Down, Ctrl+M Ctrl+M...

    40. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      Sorry but I cannot believe that anyone (much less someone calling him/herself a programmer!!) isn't able to get a so simple idea! So here comes an additional explanation just in case: this wasn't about understanding a whole code as a conventional text, but about using speech-recognition (= me talking by using very simple and easy words) to trigger different actions (e.g., pointer moved to line X, Y text pasted in that part, bit Z removed, etc.). The only person thinking about creating a whole grammar to understand a programming language (?!) was you. I was only interested in having a reliable sub-system always understanding my instructions.

      The ONLY PROBLEM which I found with this approach (other than that it was working perfectly) was that variable names aren't recognised because of not being valid (English) words (-> that's why I said that I wouldn't have even minded to spell them out; just the variable names which weren't recognised otherwise!!). If you want to help/criticise, please, focus on the actual problem rather than on your imagination: in-built speech recognition engine not being able to account for variable names (= invalid but similar to valid English words merged together).

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    41. Re:Bad experiences on this front by Anonymous Coward · · Score: 0

      If I want to change the indentation (or the scope of the method is different and the default indentation too), I would use the corresponding command such that the application can perform the corresponding action (changing the indentation);

      Yep, I understood that, and that's exactly what I was criticizing you for doing.
      Yes, "some people" (hint: you) do have very poor understanding capabilities.

    42. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      Yep, I understood that, and that's exactly what I was criticizing you for doing. Yes, "some people" (hint: you) do have very poor understanding capabilities.

      So, you are basically repeating what I said by inverting the targets (me rather than you). Synchronising words and reality isn't to important, right? In your mind, you have won this conversation, because this is all what it is about, isn't it? Going around saying random words (to anyone for any reason; the whole world is here waiting for you to come in for whatever reason and say whatever you feel like saying), with random meanings and randomly deciding what has been the output? Fascinating! LOL.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    43. Re: Bad experiences on this front by Anonymous Coward · · Score: 0

      Try Dragon from nuance. It doesn't matter what kind of accent you have, it can be trained to understand how you speak. I train medical reporters and they love it.

    44. Re: Bad experiences on this front by CustomSolvers2 · · Score: 1

      Thanks for the tip. I might test it next time (although relying on native .NET libraries is certainly an advantage when programming in .NET). In any case, it seems that my real problem might be almost-unsolvable on account of current speech-recognition approaches. The accent was problematic, but the biggest deal was the impossibility of recognising variable names (e.g., "var1", "function2", etc. all of them expected to be recognised as single words). Apparently, most of current speech-recognition approaches focus on existing words on the given language and try to match everything to them (in the best scenarios, "var1" recognised as "var" + "1"; and "function2" recognised as "function" and "2"). An alternative enabling the option of supporting pure phonetic (or spelling) would be excellent (I might even try to do something by my own). Anyway, I will see whenever I will decide to give a new shot at all this, what is very unlikely to happen before next year.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
  7. and how much by Anonymous Coward · · Score: 0

    of your lovely data is slurped by MS during the translation?

    All of it? Now thats a surprise.... Not...

    1. Re:and how much by Anonymous Coward · · Score: 0

      [citation needed]

    2. Re:and how much by Opportunist · · Score: 1

      We have arrived at the point where assuming that a company wants to invade your privacy is pretty much the default position.

      --
      We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
  8. "As Accurate As Professional Transcribers" by Anonymous Coward · · Score: 5, Funny

    "As Accurate As Professional Transcribers..."

    They left out "from Uzbekistan transcribing Navajo - underwater".

    Never trust anything Clippy say.

    1. Re:"As Accurate As Professional Transcribers" by skids · · Score: 1

      They left out "from Uzbekistan transcribing Navajo - underwater".

      and "...working on cell phones with auto-correct enabled"

    2. Re:"As Accurate As Professional Transcribers" by Anonymous Coward · · Score: 0

      > They left out "from Uzbekistan transcribing Navajo

      I was about to roll my eyes...another /. troll...

      > - underwater". ...but then I lost it there.

      Well played, sir.

  9. Sample: by Anonymous Coward · · Score: 0

    Microsoft piece recognition is a cod am pizza ship.

    Microsoft socks!

    General Protection fault...

  10. Perfect by Anonymous Coward · · Score: 0

    Dear aunt, let's set so double the killer delete select all.

    1. Re:Perfect by lobiusmoop · · Score: 1

      That's from over 10 years ago, which in computing terms is ancient history.

      --
      "I bless every day that I continue to live, for every day is pure profit."
  11. Teach to the test? by Anonymous Coward · · Score: 0

    Does this mean it is good, or does it mean they overtrained it on the samples and those people in those recordings are the lucky few it will understand now?

    1. Re:Teach to the test? by Anonymous Coward · · Score: 0

      A friend's brother was working for IBM around that time frame, helping build their speech recognition for phone tree systems. He had a list of around a hundred words and passed them around at a Christmas party to a few of us so the system could be trained on Southern accents. Called some number and read them out.

  12. Curious what the results are on modern hardware by DraconPern · · Score: 1

    They should do tests using modern hardware. For example the speech recognition on iOS seems to be pretty good. If they can get this technology into windows 10 that would be awesome. Oh I dictated this using iOS.

    1. Re:Curious what the results are on modern hardware by religionofpeas · · Score: 1

      You talk in a typewriter font ?

    2. Re:Curious what the results are on modern hardware by qbast · · Score: 1

      I tried the same paragraph on iPad - flawless recognition. Then on Windows 10 - this is the result: "He shouldn't have been tested in what example dispute recall he can get distinctive into windows and that would be on a dichotic and"

  13. NSA by Dan+East · · Score: 1

    The NSA would love this. Keyword scanning of 95% of what's spoken in phone conversations (given enough processing power to transcribe them all).

    --
    Better known as 318230.
    1. Re: NSA by Anonymous Coward · · Score: 0

      You betcha their System is already 1000 percent better than MSFT's nur they keep it semi secret in Order Not to scare the sheeple Off the Phone system.

      Even belarus Had this by now. Go figure.

  14. Yeah by Dunbal · · Score: 1

    Just make sure you run it on an air gapped computer if you want your conversation to remain private.

    --
    Seven puppies were harmed during the making of this post.
  15. It's the *research* group, not product by Anonymous Coward · · Score: 0

    That means :

    - they published against a corpus from the 90`s... Not real world results (ya know, ground truth and sciency and all)

    - it's, ya know, research... they side with publish or perish mindset...

    - if this was so amazing, the *product* team would be claiming it. And, given the viable *product* they have is Windows, the chances for tech transfer to Microsoft products are near zero.

    So treat this like some college PhD thesis paper, not a pre-product announcement. Must such for Microsoft research, knowing that their results won't make it into products.

    1. Re: It's the *research* group, not product by Anonymous Coward · · Score: 0

      In mfc they can even properly implement has Tables.

      Msft has Always been à bunch of Amateurs.

  16. 5% error rate is acceptable? really? by Anonymous Coward · · Score: 1

    I worked as a professional transcriber in the legal profession, actually employed by the government. 95% accuracy would be 1 mistake in 20 words, an error almost every 2 lines. For the standard we had to type to, an error every 2 pages would be unacceptable. These transcripts are admissible evidence in court as an exception to hearsay rules and people's lives hang on the accuracy of them. The transcripts themselves are also literally the law of the land (I live in a common law jurisdiction, so my transcript is literally legally binding law and a printout of my transcript is admissible for that purpose as well). Imagine a 5% error rate in that.

    Also, judges always speak "The Queen's English". How is this algorithm going to translate what they really say into proper language suitable for a judicial order? I'd also love to see how it deals with technical Jargon; for example citations that are spoken all sorts of haphazard ways yet must be typed in a specific format.

    And this doesn't even factor in the thick accents many people use that are almost unrecognizable by the best humans, how is a computer going to deal with that?

  17. Yes! Way to go Microsoft! by Anonymous Coward · · Score: 0

    Excellent job, as always.

  18. Comically inaccurate by Larry_Dillon · · Score: 1

    At work we have an cloud-based Outlook that transcribes voicemail to text. It's so comically inaccurate that we sometimes forward the results to the sender and we both get a good laugh.

    --
    Competition Good, Monopoly Bad.
    1. Re:Comically inaccurate by Anonymous Coward · · Score: 0

      Don't get me started with Microsoft, Cloud and Outlook.

      My $EMPLOYER forces that down our throats (the decision makers have been to Seattle last year and seem to have had a nice trip).

      As seen from my POV, this stuff violates the Geneva Convention in at least three ways.

      That's how Microsoft manages to sell their crap. Whenever the decision makers are not those who have to suffer the consequences, it's a done deal. The same is the case with weapons.

    2. Re: Comically inaccurate by Anonymous Coward · · Score: 0

      Redmond hookers and bribes...

  19. Microsoft Speech Recognition Now As Accurate - Say by WeBMartians · · Score: 3, Interesting

    If it can recognize "It's difficult to wreck a nice beach", I'll be thoroughly 'whelmed'.

  20. "Show me to buy milk at this opportunity."anyone? by itsme1234 · · Score: 1

    The lameness filter is lame.

  21. And my reply .... by Anonymous Coward · · Score: 0

    Holyfield is that all this was worse than the fact that this bird expert was appreciating uh ... providing the same as his own promotion to prevent fraud reform Reflection Julia Roberts Comments Police links Drug entry Expect that night beatings

  22. More layoffs/automation by Billly+Gates · · Score: 0

    This could save doctors a ton of money if this is true. I suppose transcribers are for teenagers and not for raising a family and they should have made better life choices than demand $15/hr?

    Even if there are higher error rates you can have 1 real human monitor 50 computer automatized transcriptions where the computer is fairly sure but not 100% certain. Hospitals are always looking for ways to save pennies

  23. How does it do with... by judoguy · · Score: 1

    IgPay AtinLay?

    --
    Peace is easy to achieve, just surrender. Liberty is much harder get/keep.
    1. Re:How does it do with... by PoopJuggler · · Score: 1

      "Dear aunt, let's set so double the killer delete select all."

  24. Is this a joke? by Anonymous Coward · · Score: 0

    puts its accuracy on par with professional human transcribers who have advantages like the ability to listen to text several times

    Is this a joke or does the author have literally no idea how computer memory works?

  25. In which environment? by Opportunist · · Score: 2

    In a sound proof studio built for sound recording spoken by someone with speech training?

    Or in an environment with 30 people talking in the background, an air condition running, doors and drawers slamming, people laughing, feet
    and chairs shuffling across the floor, some photocopiers that got their last service before Bush left office whining for hours and a person speaking into the phone while at the same time talking to coworkers and you're expected to know which words belong to you and which ones are directed at someone else?

    Aka "open plan office".

    --
    We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
    1. Re:In which environment? by walterhpdx · · Score: 1

      THIS! Because airlines that use voice recognition technology deserve a special place in hell. Trying to go through an airline's automated system in a busy, *noisy* airport is nigh on impossible. You'd think they would have thought that through.

  26. WAY TO MISS THE BLOODY POINT! by Anonymous Coward · · Score: 0

    Way to miss the point. Automation doesn't mean that the task has to be fully automated. Even just partially automating a task can have huge savings.

    Let's look at the situation you've described. Instead of having let's say 10 highly-trained transcriber doing the transcriptions manually, they'd have 2 less-trained and/or less-skilled (read: cheaper) workers verifying the transcriptions done by computers and making corrections where necessary.

    So the bulk of the work is offloaded onto the computer, and a smaller number of cheaper humans are used to detect and fix up the 5% to 10% of mistakes made by the computer.

    So instead of paying 10 humans a high wage, they're paying 2 humans a lower wage. Maybe the 10 humans cost $1,000,000 a year. Now they're only paying $60,000 a year in wages, plus the negligible cost of running the computer software. It could be an annual savings of 90% or more, with just partial automation.

    1. Re:WAY TO MISS THE BLOODY POINT! by jabuzz · · Score: 1

      Given that transcription is not a highly paid area, and that a moderate typist can transcribe pretty much as fast as as you talk, there is not a chance in hell you can fire 10 transcribers and hire two.

      However this is 2017, there is no need to have your transcription service in central London for example. Punt the audio file to somewhere else over the internet. It doesn't need to even leave the UK to be much cheaper than being in central London either.

      In fact this is perfect for homeworking to be honest. Especially given the pay rates and demographic profile of most transcriptionists. That is the job is not exactly high pay, most of them are female and a high number give up work as childcare costs are too much once they have children. Take the commute out the equation and bingo pool of skilled workers ready and waiting. Bit of flexible working to do the school run and jobs a good one.

      The only specialist gear you need is a set of foot pedals and they cost under 100GBP for a USB set from the likes of Philips or Olympus. A full kit including software and headphones and pedals is under 200GBP.

    2. Re:WAY TO MISS THE BLOODY POINT! by Anonymous Coward · · Score: 0

      Way to spout off about a subject you know nothing about about pretend you've got a clue. Why do idiots always scream the loudest how smart they are?

      I can type faster than you can talk -- I also sometimes do real-time transcription. I often accelerate the playback with slow speakers or people who pause for dramatic effect. So if I have to listen through anyway to manually verify and correct the transcription, it won't save any time at all. It will actually slow me down because I'll have to pause for every correction or to format things correctly. It's faster to type it than it is to cursor around the text to find the point to edit.

      How will your automation + proofreading allow for real-time transcription? At the end of 5 hours of evidence I can provide each lawyer a CD of the *certified* transcript so they can use it that night to prepare for the next day's proceedings.

      You're also not going to get around the "highly paid" part simply because of the importance of accurate transcripts. If you've got 8 lawyers and a judge arguing about the transcript at a collective $1500/hour billed to the government plus the administrative overhead of the courtroom, saving $20,000/year on transcription costs is pretty meaningless. I would also be subpoenaed and questioned about any significant mistakes I signed off on...not a job you're going to give a $30,000/year secretary.

      What is "automated" now is that the layers can get a copy of the raw audio recording very cheaply. It's not very useful because a) they can't play it in court; no judge is going to sit for them playing bits and pieces wasting time because they're too cheap to get a ttranscript. and b) they'd rather pay me $100,000 per year than waste their own precious time listening through the audio looking for what they want.

      You really are a complete idiot about the subject. No wonder you act like you know everything.

    3. Re:WAY TO MISS THE BLOODY POINT! by Anonymous Coward · · Score: 1

      I don't know about other kinds of transcription, but Court transcription is very highly paid and I believe so is medical transcription. For civil and family court, I get on the order of $8-10/page (at 32 lines per page) and I can type 20-25 pages per hour. Plus a hefty expediting fee if they can't wait 2-4 weeks. Plus I get paid for my time in court. I had a co-worker who had a part time job doing movie closed captioning, but that paid a lot less than our day job.

      Your condescending attitude aside, this job requires making the recordings in court personally (part of the legislation that exempts our transcripts from hearsay laws -- how can I certify that this is what was really said in court if I wasn't there personally to hear it?). It is not a part time job for single moms to pick up a few extra bucks. And since courts often run late, parents have a huge problem doing this job at all -- how can you pick up your kid at 4 every day when at least once a month you're staying until 6?

      I make more than a lot of the lawyer do, at least the legal aid ones. I also have logged far more courtroom hours than most lawyers twice my age and could do their job a lot better than they could if I had a law degree. I know most of the seminal cases better than them, I've typed many reported decisions which I will obviously know better than anyone who read it, and I have a large library of unreported decisions which are still legally binding.

      But you go on thinking it's single moms pecking away making a few bucks an hour. Notice I'm on slashdot, I have a computer science undergraduate degree from a top tier school, but I make a hell of a lot more at this job 50 hours a week than I ever did slaving away programming 80 hours a week -- at a job that lasted 2 years before the company I worked for went bust.

      Oh, and on top of that, I get a defined benefits government pension. All this and I'll retire before I'm 60 with a government pension higher than most CS people make in their peak earning years.

      It's been possible to offshore to India for years now. At least one company divides the audio into 5 minute chunks and spreads it out to a large typing pool, so I can have a 5 hour audio file transcribed and returned to me in 30 minutes and at an amazingly low cost. The problem is the quality is so low the service is useless. They also don't format it

      And fwiw, the software is free; the recording software is very expensive, but that's only on the equipment in the courtrooms. The software to play back their proprietary files is free. We do have a hefty annual fee to a professional standards organization though.

    4. Re:WAY TO MISS THE BLOODY POINT! by jabuzz · · Score: 1

      Because court transcribers are less than 0.1% of people doing transcription, that's why. No idea what it's like in the USA, but in the UK the NHS does not pay at that level for medical transcription services, and top law firms don't either. I very much doubt the court transcribers get paid that much either. I will however ask my brother (aka a real life Judge) what they get paid next week when I see him. However a quick google suggests 60GBP per a 5.5 hour day sitting after which overtime kicks in but rarely more than 7 hours which seems about right to me. I tell you now gets that late and your adjourn. I imagine those doing Hansard (that's Parliament's transcription service) get paid a lot more though.

      So in the UK someone doing transcription is going to be earning in the region of 15k-20 GBP outside London, and more inside.

      My suggestion was not to punt it to India, but instead of doing in central London, have it done in say Newcastle or Liverpool where as property prices are not insane like they are in London wages are lower.

      This was all possible back in 2000. Between my brother and I we had it all worked out, business plans and everything, then the dotcom bubble burst. Oh and I am not talking single mothers either. Back in 2000 my brother worked at a large UK law firm and it was a problem that once they had kids and 99% of those doing the transcription where women, the cost of childcare made it uneconomic to return to work. Had a number of mothers lined up and eager to do the work.

      Oh and most transcription is done from a dictaphone dude. Your court transcription is such a tiny tiny fraction of the market that it's not worth talking about really so get of your high horse.

    5. Re:WAY TO MISS THE BLOODY POINT! by Anonymous Coward · · Score: 0

      Because court transcribers are less than 0.1% of people doing transcription, that's why.

      In the province where I live, there are 500 court transcriptionists and approximately 5 million employed people.

      If those 500 court transcriptionists represent 0.1% of transcriptionists, that means there are 500,000 transcriptionists or 10% of the entire provincial workforce. You're 0.1% is out by orders of magnitude. The real number is closer to 10% of all transcriptionists.

      It's not a tiny, tiny fraction, and you are an idiot who has no idea what you're talking about. And office admins who occasionally type a memo from a dictaphone are not transcriptionists. Dumbass.

  27. Microsoft Speech Recognition Now As Accurate by Anonymous Coward · · Score: 1

    Microsoft Speech Recognition Now As Accurate As Professional Transcribers who are deaf and whose native language is Esperanto.

  28. On the down side by fahrbot-bot · · Score: 2

    It still showed up at the South Park "Save Films from their Directors" club for the wrong reason when it heard, "Free Hat".

    (For those that aren't South Park followers...)

    Cartman writes "Free Hat" on the advertising poster in the belief that freebies are necessary to attract people. However, the crowd mistakenly thinks the rally is to free Hat McCullough, a convicted baby killer they believe was innocent.

    Now thinking that "Free Hat" would be a great name of one of those Windows App Store pirate streaming apps ...

    --
    It must have been something you assimilated. . . .
  29. Previous experience by Anonymous Coward · · Score: 1

    I was pleasantly surprised by the voice-message to email service my last employer had with Google.

    They sent you the voice message in an attachment with the translation in the email. If the translation didn't make sense, you could play the audio yourself.

    Only annoying thing was we still had to delete the VM off the phone manually afterward.

  30. The acid test by John+Jorsett · · Score: 1

    Will it transcribe, "Diffused the situation," or "Defused the situation"? Every single TV closed-caption I've ever seen, and I've taken special note since I first became aware of this, has gone with the former. And those presumably have been humans making that error.

  31. Hype, more hype, and maybe outright lies by Rick+Schumann · · Score: 2

    If you believe Microsoft without independent verification from an otherwise uninterested third-party who has no investment in the outcome, then you're a fool.

  32. 5% by MMC+Monster · · Score: 2

    One in 20 words is wrong?

    How can a human transcriptionist be that bad?

    --
    Help! I'm a slashdot refugee.
    1. Re:5% by gweihir · · Score: 1

      It is not. Sure, humans get a word wrong, but they will only very rarely mangle the meaning. Machine transcription, on the other hand, will often get meaning wrong and that is a serious problem.

      The only thing this shows is use of an unsuitable (in fact, utterly stupid) metric for marketing purposes.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  33. Ultimate test by Anonymous Coward · · Score: 0

    Native speakers in a southern court room.

    1. Re:Ultimate test by Anonymous Coward · · Score: 0

      Try a Toronto courtroom. Accents from all over the world. I've seen accused people demand "english to english" language interpreters because they claim they can't understand the prosecutor's accent.

  34. Stop believing those kinds of miracle stories by Anonymous Coward · · Score: 0

    I bet you also want me to believe that Jesuits survived Hiroshima because of their rosaries.

    1. Re:Stop believing those kinds of miracle stories by Anonymous Coward · · Score: 0

      Huh?

  35. Fantasy by Anonymous Coward · · Score: 0

    Good results on a small corpus are no guarantee of future performance. MS should provide a website where you can talk to their system and see what it outputs. Then, we'll be separating the men from the covfefes.

  36. Nonsense by gweihir · · Score: 1

    Humans transcribers "have the advantage to be able to listen to the recording several times"? What utterly demented nonsense is that? Of course, the software, having the recording, can "listen" to it as often as it wants. There is absolutely no "advantage" here for the human transcribers.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    1. Re:Nonsense by JesseMcDonald · · Score: 1

      Of course, the software, having the recording, can "listen" to it as often as it wants. There is absolutely no "advantage" here for the human transcribers.

      The advantage is that the humans are being given much more time to process the recording. While the human transcriptionists are reviewing the recording multiple times, in real-time, the speech recognition software is producing immediate results.

      --
      "The state is that great fiction by which everyone tries to live at the expense of everyone else." - Bastiat
  37. Tad misleading by Oligonicella · · Score: 1

    puts its accuracy on par with professional human transcribers who have advantages like the ability to listen to text several times

    As if the audio sails by the program and isn't stored in memory and parsed as many times as needed.

  38. Works form me by mnemotronic · · Score: 1

    I fuse micro sot noise recognition ball the time it words fall Leslie.

    --
    The Russians have won. They have made the world a cesspool of distrust, greed, fear and hate.
  39. ROTFLMAO!!! by whitroth · · Score: 1

    It is? And who decided *that*?

    We've got it on our hybrid phones. At least half the time, the voice transcription "preview" resembles, randomly, Vogon poetry, or perhaps only "computer poetry" from 40 years ago. It rarely gets a name or title correct, and the message they're trying to leave, *maybe* 50% is close enough to guess what they meant, without listening to the mp3.