Slashdot Mirror


Microsoft Speech Recognition Now As Accurate As Professional Transcribers (techcrunch.com)

An anonymous reader quotes TechCrunch: Microsoft announced today that its conversational speech recognition system has reached a 5.1% error rate, its lowest so far. This surpasses the 5.9% error rate reached last year by a group of researchers from Microsoft Artificial Intelligence and Research and puts its accuracy on par with professional human transcribers who have advantages like the ability to listen to text several times. Both studies transcribed recordings from the Switchboard corpus, a collection of about 2,400 telephone conversations that have been used by researchers to test speech recognition systems since the early 1990s. The new study was performed by a group of researchers at Microsoft AI and Research with the goal of achieving the same level of accuracy as a group of human transcribers who were able to listen to what they were transcribing several times, access its conversational context and work with other transcribers.

117 of 176 comments (clear)

  1. Laughable Hype by bwanagary · · Score: 5, Interesting

    On a daily basis in my work environment Microsoft technology is used to a) record voicemail and b) generate text from the speech.  Never, ever, have I received any converted voicemail that wasn't completely unintelligible gibberish.  Seriously.  This is utter nonsense.

    1. Re:Laughable Hype by jellomizer · · Score: 1

      That isn't the fault of speech recognition, but context recognition.
      If you had a command prompt and say Open My Files it would do the same thing.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    2. Re:Laughable Hype by avandesande · · Score: 4, Funny

      You should start talking with people who don't speak gibberish.

      --
      love is just extroverted narcissism
    3. Re:Laughable Hype by bobstreo · · Score: 3, Insightful

      You should start talking with people who don't speak gibberish.

      Yeah, but Mumbai is on the phone with us again...

    4. Re: Laughable Hype by Anonymous Coward · · Score: 2, Insightful

      We have a up to date Microsoft service doing this at my work. Accuracy is a running joke and I regularly forward people their transcriptions so we all get a good laugh. This might be lab quality recordings with limitations on launguage complexity used to cut down on errors. Error rate of a closed set test isnt really a great indicator. Now a year long comparison against several call centers in multiple industries would be quite compelling.

    5. Re:Laughable Hype by Opportunist · · Score: 1

      Then stop outsourcing to countries where this is the native language.

      --
      We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
    6. Re:Laughable Hype by Luthair · · Score: 2

      No, context recognition would mean the correct word but wrong meaning. Buy and My are clearly distinct words with different pronunciation.

    7. Re:Laughable Hype by Luthair · · Score: 3, Insightful

      3) How much background noise? Are these from people calling from cell phones. Or a LAN line.

      Why does it matter? If it doesn't function in a standard operating environment then it isn't doing as claimed. What would you say to a watch maker who claimed their product was unscratchable but testing consisted of rubbing it with microfibre cloth?

    8. Re: Laughable Hype by Anonymous Coward · · Score: 1

      Are these from people calling from cell phones. Or a LAN line?

      ROFL

      A LAN line?? WTF is a LAN line?

    9. Re:Laughable Hype by pr0fessor · · Score: 3, Insightful

      3.... I've tried various voice recognition software over the years and can say they are getting much better but if there is any background noise forget it.

      I quit trying to use siri because when I get in the car and ask siri for directions if my wife is with me I get siri saying "I couldn't find, 102 why the fuck street don't you type in the address like a regular shut up person damn it.

    10. Re: Laughable Hype by Chaset · · Score: 2

      I just read that as an IP phone connected to the LAN. I have one of those at work. It is theoretically better audio quality than the analog internal phone system it replaced. So cell phone=really bad, LAN line=really good audio quality.

      --
      -- "This world is a comedy to those who think, a tragedy to those who feel."
    11. Re:Laughable Hype by skids · · Score: 1

      The missing part in this equation is the quality of the "human transcribers". I worked a few mturk transcription microjobs JOOC a decade or so back. Occasionally the job was to validate another person's transcription. It was rather awful. I don't blame them, though, because the pay is rather awful, too, especially for a job that pretty much monopolizes your attention.

    12. Re:Laughable Hype by rcharbon · · Score: 1

      If there was speech recognition that was 99.9% accurate for me if I were to stuff my leg down my throat first, get me a bone saw.

    13. Re:Laughable Hype by Anonymous Coward · · Score: 1

      Hey, couldn't you wife learn to shut up long enough for you to ask for directions? It would only be fair since you're breaking the male stereotype.

    14. Re:Laughable Hype by arth1 · · Score: 1

      If you had a command prompt and say Open My Files it would do the same thing

      My experience with voice recognition is that if I said "open my files", it would interpret that as "reboot now".

    15. Re: Laughable Hype by arth1 · · Score: 1

      Like if someone announces they are pregnant will it translate to zÃZo shÃ"ng guÃf zÃ

      That's rather embarazado...

    16. Re:Laughable Hype by Darinbob · · Score: 1

      I keep some of those emails, the transcriptions are hilarious.

    17. Re: Laughable Hype by avandesande · · Score: 1

      I've been getting these voicemail transcriptions from Vonage for years, and although it screws up a lot I get the gist of what the person is saying which is enough. It's certainly better than having to call into a voicemail system.

      --
      love is just extroverted narcissism
    18. Re: Laughable Hype by thewolfkin · · Score: 1

      Are these from people calling from cell phones. Or a LAN line?

      ROFL

      A LAN line?? WTF is a LAN line?

      or maybe a typographical error that was supposed to say LAND line. I often type other words from what I think and his fingers just auto put LAN instead of land. Or maybe he's young enough to not remember that it's "landline" and after hearing so many people talk about it he assumed the term was lanline. either way hardly a ROFL moment.

      --
      Just another second banana
    19. Re:Laughable Hype by Threni · · Score: 1

      To be fair, they were comparing the quality of microsoft's translation with the sort of quality you get from the sort of Indian who phones you up at home when you're having your food to tell you that "your computer is having virus and internet is being attacked with hacker but for you sir because i work for microsoft and can do the needful i can remove hacker for only $699.99"

    20. Re:Laughable Hype by swillden · · Score: 1

      Or a LAN line.

      You mean "land line". The terminology predates IP telephony, and in any case the term clearly cannot apply to local area networks.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    21. Re:Laughable Hype by angel'o'sphere · · Score: 1

      Actually it would give an error: 'My' - file not found. 'Files' - file not found.

      Yes, macOS/OS X has an "open" command. And "open 'My Files'" would have worked just fine, supposed you had a folder or file called 'My Files'.

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    22. Re:Laughable Hype by Luthair · · Score: 1

      You replied to the wrong person.... I assumed he meant VOIP personally.

    23. Re:Laughable Hype by swillden · · Score: 1

      I did reply to the wrong person. I'm surprised I did that. In any case, it would make no sense to call VOIP a "LAN line".

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    24. Re:Laughable Hype by Luthair · · Score: 1

      Why? If you have a SIP phone you plug your LAN into it /shrug.

  2. Errors are not Errors by idji · · Score: 5, Insightful

    When a human transcriptionist makes a mistake you can usually work out what they meant. When Speech-to-text (STT) makes a mistake it is often gibberish. So objectively it is "better" at transcribing, but subjectively much worse.

    1. Re:Errors are not Errors by AmiMoJo · · Score: 4, Interesting

      Not any more. One of the ways that they got the accuracy up so high is by giving the machine an understanding of English and common phrases, similar to what a human has. It's been used for input correction on smartphones for a while too, e.g. with the Google keyboard it can correct the previous word based on the next one you type if it realizes that they don't make sense together.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    2. Re:Errors are not Errors by Anonymous Coward · · Score: 1

      But you will know that there is an error. With a professional person, they will make the whole sentence look just fine... and you will never realise there was an issue, which is much worse IMHO...

    3. Re:Errors are not Errors by jellomizer · · Score: 4, Informative

      Normally we have transcriptionist who are trained in a particular area to understand the context of the message. A legal transcriptionist requires different training then a Medical Transcriptionist.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    4. Re:Errors are not Errors by JasterBobaMereel · · Score: 1

      Unless it actually understands what is being said then it will always make mistakes that result in gibberish

      If they are saying that they have cracked this, then they have strong AI, and should be announcing it to the worlds press ... (they haven't)

      They have added some syntax and grammar rules... just like everybody else ...

      --
      Puteulanus fenestra mortis
    5. Re:Errors are not Errors by AmiMoJo · · Score: 4, Interesting

      It's more than just syntax and grammar rules. For example, Google has been mining the web for that kind of knowledge. You can see it in Google Translate sometimes. It generates suggestions for your input, and sometimes screws up like thinking "alot" is a word. It also uses colloquialisms in its output, which again it gathered from analysis of the web and which doesn't fit standard grammar or syntax rules.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    6. Re:Errors are not Errors by K.+S.+Kyosuke · · Score: 3, Insightful

      Hey, it's going to cost $700 per minute but at least there will be no errors!

      So it's about three times cheaper than the lawyer that you'd need if you get sued for a bad transcription?

      --
      Ezekiel 23:20
    7. Re:Errors are not Errors by Billly+Gates · · Score: 1

      Just keep it recorded and have a human review it.

      This could cut costs greatly with this automation if it is true. Why pay 50 transcribers when you can pay 1 for a reduced wage since demand will now be lower and have the computer do the work for free?

    8. Re:Errors are not Errors by gnick · · Score: 4, Insightful

      A legal transcriptionist requires different training then a Medical Transcriptionist.

      And sometimes even that training falls short. Does anyone remember the explosion at WIPP when the tech transcribed "an organic kitty litter" instead of "inorganic kitty litter"?
      Kitty litter explosion.

      --
      He's getting rather old, but he's a good mouse.
    9. Re:Errors are not Errors by Dripdry · · Score: 1

      No amount of money can ameliorate sending your mother-in-law,"Ask her why her penis caught in her dress" instead of "Ask her why her pin is stuck in her dress"

      None.

      Ever.

      --
      -
    10. Re:Errors are not Errors by Opportunist · · Score: 1

      Who is moving goalpost. The goalpost is "write what I said", and that didn't move an inch.

      --
      We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
    11. Re:Errors are not Errors by hord · · Score: 2

      The way the machine learning databases are built, it does understand what is being said. That's why it is so effective. This happens through the connections that are built inside the neural network along with the architecture of the network itself. They are now using context-sensitive data labeling to assign specific meaning to words that are generally ambiguous based on the text around these words. The neural net can learn over time which combinations of words are likely to fall within specific categories and use this as a basis for translation later on.

      There are several teams working on this and they have been publishing papers for a while now. The press is probably just picking up on it. Google's translation service is said to have increased in accuracy due to the same general principles. Facebook is looking into it for better social data aggregating. Various academic teams are doing it for prestige and to add to the field. Lots of good work here, actually although there is still far, far to go.

    12. Re:Errors are not Errors by hord · · Score: 3, Interesting

      I'm not a statistician but it's possible that once you can prove that the neural network can produce answers at a success rate higher than humans you would be introducing error by allowing humans to review it. I'm not saying it shouldn't be done but this is one of the weird questions that people will have to ask on a case-by-case basis as these technologies are applied to real problems.

    13. Re:Errors are not Errors by 91degrees · · Score: 1

      It would be nice if there were some examples so we could compare for ourselves. If we're looking at occasionally picking the wrong homophone, it's a lot better than getting entire sentences mangled.

    14. Re:Errors are not Errors by SeattleLawGuy · · Score: 2

      Hey, it's going to cost $700 per minute but at least there will be no errors!

      So it's about three times cheaper than the lawyer that you'd need if you get sued for a bad transcription?

      This will eventually bring down the costs of lawsuits by making court reporters less common, but that may take a few decades.

      Not many lawyers are $700 per minute. Even $700 per hour is rare.

      And do you know how much we have to pay to go through law school and have our senses of humor surgically removed?

      --
      Real lawyers write in C++
    15. Re:Errors are not Errors by Chaset · · Score: 1

      A few years ago, a colleague of mine and I were working in Japan. He was writing up a request for a quote and ran it through Google Translate to check his Japanese; expecting to get back an English phrase that at least vaguely corresponded to what he wanted to convey. All I remember was that the output contained the phrase "stormy bedroom". I had no idea how that came from his original text. Anyways, I told him to forget using Google translate.

      --
      -- "This world is a comedy to those who think, a tragedy to those who feel."
    16. Re:Errors are not Errors by arth1 · · Score: 1

      It's more than just syntax and grammar rules. For example, Google has been mining the web for that kind of knowledge. You can see it in Google Translate sometimes. It generates suggestions for your input, and sometimes screws up like thinking "alot" is a word. It also uses colloquialisms in its output, which again it gathered from analysis of the web and which doesn't fit standard grammar or syntax rules.

      Google Translate relies on community suggestions and validation. See https://translate.google.com/c...
      The problem is that not everyone who joins there are truly fluent in both languages, nor all that literate.

    17. Re:Errors are not Errors by djinn6 · · Score: 3, Interesting

      The way the machine learning databases are built, it does understand what is being said.

      I think the word "understand" has a more general meaning than what you wrote later on. For it to understand what was being said, beyond making grammatical sense of the sentence, it needs to know the abstract concepts behind the words and be able to manipulate them.

      For example:

      Jeff is a software engineer, Kate is a software engineer, and Larry is also ...

      Can you finish the sentence?

      Most humans could do it with a high degree of accuracy. Some might even find the obvious answer so boring that they try for a more creative one. However, ML is still very far from that.

      Since it does not grasp the abstract concepts, its transcription is much more likely to lose meaning than a human transcriber. When talking about network technology for example, a human will not mis-transcribe "NAT" to "gnat", while a machine will.

    18. Re:Errors are not Errors by Anonymous Coward · · Score: 1

      I'd love to see that AI. Part of the job is not only identifying who said what, but slightly editing the content based on the individual. Witnesses are exactly verbatim, lawyers have some of the ums, uhs, and stuttering cleaned up a bit, "mm-hmm" can become "yes", etc. Judges speak The Queens English at all times.

      They're not supposed to, but people speak over one and other all the time too. Part of the transcriptionists job is to be in court, order people to repeat things that get garbled by talking over each other, stand closer to microphones, etc. The transcriptionist, while, in court, also has to get all the proper name spellings and match up voice to speaker. Let's see the AI do that.

      It would be easier to make an AI to replace the lawyers and judges than it would be to make one to replace the transcriptionist or court reporter.

    19. Re:Errors are not Errors by VeryFluffyBunny · · Score: 1

      Re: "grammar and syntax rules" -- Are you sure these words mean what you think they mean? And whose standard are you referring to?

      BTW, languages are complex probabilistic systems and so rules are unlikely to describe common usage with sufficient accuracy to be meaningful. Yes, there are large tomes of grammar rules for sticklers to memorise and pontificate about but they only describe the lesser used fringes of formal language and writing. The vast majority of language usage that remains cannot be described as "rule governed."

      --
      Debate is a form of harassment. Do not question my truth.
    20. Re:Errors are not Errors by thegarbz · · Score: 1

      Generally if you crowd source this you end up with a pretty good result. You don't need anywhere near "everyone" to make it work.

  3. Using it to post on slashdot by Harald+Paulsen · · Score: 4, Funny

    holyfield is these all of this was made worse by the fact that i had these birds skilled estimate uh... supplying itself what's your special prom to prevent fraud reform
    thoughtfulness julia roberts police comments entry drug connections predicting that nighttime beating

    --
    Harald
    1. Re:Using it to post on slashdot by Optic7 · · Score: 1

      Bastard! My head almost exploded from trying to hold back tears of laughter at work! 10 out of 5 funny, because it's exactly like the transcriptions I get from Google Voice.

  4. Bad experiences on this front by CustomSolvers2 · · Score: 4, Interesting

    Some months ago, I did some tests with speech recognition software and my conclusion was that it is still too unreliable. My intention was to develop an application allowing me to write moderately complex code by voice (creating files and folders, including proper indentation, recognising functions, variables and other basic elements, etc. Basically, allowing me to write/edit the main parts of a random algorithm in certain language without touching the keyboard). I did test Microsoft in-built functionality (+ used one of Microsoft's .NET programming languages) and it wasn't even close to what "5.9% error rate" seems to indicate (almost perfect?).

    In defence of the software, I have to say that my English accent isn't precisely excellent (some people say that it is "too thick" and other people just say "what?". LOL) and honestly I make a very little effort to pronounce properly. But this is also the problem with speech recognition: it is mostly focused on a specific language/accent/intonation. I was doing my tests in an English Windows version and this was the language for the default speech recognition (and adding a different one wasn't precisely straightforward).

    I do perfectly understand the complexity associated with developing a reliable enough piece of software delivering what I was expecting; but this is precisely the reason why I looked for existing solutions rather than developing everything myself (what I do pretty often). In any case, my impression is that you can still not expect good enough reliability of (Microsoft's) speech recognition software, much less when mixing languages/accents up (particularly problematic situation: including Spanish words when talking in English). I might give a new shot at all this next year though.

    --
    Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    1. Re:Bad experiences on this front by peragrin · · Score: 1

      The recognition system is 5.9% accurate for the testers. For the rest of us it is far far off. Human testers are 5.9% accurate across a much larger selection of people.

      I can't use voice recognition to send a text without 3-5 attempts. And I don't have a hard accent.

      --
      i thought once I was found, but it was only a dream.
    2. Re:Bad experiences on this front by Baron_Yam · · Score: 3, Interesting

      5.9% means it still gets more than 1 in 20 things wrong. That's a LOT when you're feeding the information into a system that requires pretty much a 0% error rate.

      Second, there's a huge difference between standard language and specialist syntax. With programming, you're likely going to want a LOT of special formatting that you can type without thinking but it's cumbersome to communicate via speech in a way that won't confuse a speech recognition engine.

      And finally - so long as they don't have a related disability - a proficient typist can already type about as fast as they can form decent code in their head. With a bit of 'mousework' for selection and cut-and-paste I don't see speech ever becoming the superior entry method unless and until we have genuine AI that understands your intent rather than your words.

      It might be nice to use speech as a macro-invoker, though.

    3. Re:Bad experiences on this front by jellomizer · · Score: 1

      Writing code by voice? Are you insane.
      Speech is portraying ideas in a liner fashion. Coding you are jumping up and down filling different parts of the problem. At different time.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    4. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      a) This is about real human launguages, not programming languages

      ??!!! I will repeat it again by using as simple words as I can. This will be the last time I will do it (+ most likely last time I will be talking to you if you have been logged in). Imagine that I want to create the following code (C# or Java):

      void FunctionTest (int argumento)
      {
      }

      This is code right? That code is formed by words (function/variable names). Rather than typing that words, I developed an application (I wrote code in a .NET programming language, because this is what you do to create an application, + relied on the in-built Microsoft speech-recognition engine) writing that code for me just from my voice. For example, rather than typing "void", I was saying "void" (+ my application was writing "void"). "Argumento" is a Spanish word, so when I was saying it that software recognised "argument" (the English equivalent). Have you now understood everything properly? If yes, please don't bother me anymore; if not, please, don't bother me anymore. Bye.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    5. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      Thanks. It is kind of nice to know that I am not the (whole) problem :)

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    6. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      5.9% means it still gets more than 1 in 20 things wrong

      Thanks for the mathematical lesson, but I kind of knew that :). What I meant was that my overall experience was way much worse than 1 in 20; it was almost 1 in 2. When using the English version for proper/in-dictionary English words, it performed kind of OK (1 in 5/10? when using simple words; much worse with complex words). But the biggest problem was non-existent/other-language words; for example: "var1" or "thatfunction", its performance on that front was horrible and this was what made me quit that development. It was plainly unable of recognising random variable names.

      Second, there's a huge difference between standard language and specialist syntax. With programming, you're likely going to want a LOT of special formatting that you can type without thinking but it's cumbersome to communicate via speech in a way that won't confuse a speech recognition engine.

      No. This wasn't the problem. I developed it such that I was merely inputting single (pretty simple) words. The biggest problem (on top of my accent) was the aforementioned non-existing words. I was able to do pretty much everything I wanted without problems, except including/editing random words (variable names).

      And finally - so long as they don't have a related disability - a proficient typist can already type about as fast as they can form decent code in their head

      The final goal wasn't to write quicker code or to cover any inability, but to make my programming experience more comfortable (+ giving a shot at something I wasn't too experienced in; I do this kind of things quite often). Plainly saying "open project X", "edit file Y", "change var1 to var2" with my hands in my pockets :)

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    7. Re:Bad experiences on this front by Anonymous Coward · · Score: 1, Insightful

      Words don't make a language, and C does not become English just by using some English words.
      Doing what you want is a completely different thing and would use a completely different algorithm, so at the very least it as rather off-topic to this article (mostly because things like phrases, grammar, context in general etc. don't apply, but are very important to creating a good natural language recognition).
      You are being rather arrogant about it considering you very much didn't seem to understand the poster or why his criticism is valid.

    8. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      Modesty apart, I am kind of good at developing data parsing/management algorithms completely from scratch. I was also developing that tool for my personal use, not for the general public/any situation. I was working on it just during some days and, when quitting it (as commented above, because of the underlying speech-recognition failed a lot when trying to recognise variable names), I had a reasonably good approach in place.

      It was able to open the file I wanted, insert/edit specific parts in the right location (well... kind of, as far as it wasn't able to recognise the function/variable names), to change indentation and even to modify classifications (class/namespace). It was focused on a specific programming language and only at the file level. I was also planning to not care about complex programming-language features; just about scope, variables, methods, conditions, loops, etc.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    9. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      Doing what you want is a completely different thing and would use a completely different algorithm, so at the very least it as rather off-topic to this article

      How is this true? I want my computer to write something from my speech and this is precisely what this is about. Imagine that I want to search for a specific brand with no meaning in English. This is also speech which is assumed to be recognised (at least, this was my approach). For example, Microsoft doesn't mean anything in English, but this is a quite commonly-used term.

      You are being rather arrogant about it considering you very much didn't seem to understand the poster or why his criticism is valid.

      I am not arrogant and none of my actions can reflect what I am not. I am plainly trying to be extremely clear (perhaps, aggressively clear) regarding my zero interest in dealing with people intending to have misunderstanding-, unmotivatedly-wasting-my-time and/or aggressive- (ironic, don't you think? LOL) prone attitudes. In any case and as explained above, I don't think that this criticism is valid. I clarified in my first comment that I do perfectly understand the complexity involved, but, as a final user of a speech-recognition piece of software , I don't care about that differentiation.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    10. Re:Bad experiences on this front by CustomSolvers2 · · Score: 2

      I was sharing my personal experience on this front, not implying that the outputs of this research has anything to do with current commercial accuracy. I personally found kind of surprising the high number of errors (not too much into voice-based anything, but from what I see and read everywhere I was kind of expecting something different) and merely posted about that experience.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    11. Re:Bad experiences on this front by ranton · · Score: 2

      I can't use voice recognition to send a text without 3-5 attempts. And I don't have a hard accent.

      It is very odd that you have such a low success rate with voice recognition. At least 2/3 of my voice texts can be sent without editing, and most of the errors have to do with proper names. Are you sure you don't have an accent? My wife mumbles pretty bad when talking fast (so bad I don't like talking with her on the phone most of the time) but even she has a pretty easy job using voice to text now. It was pretty bad a few years ago but it really is amazing how much better it has become.

      --
      -- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
    12. Re:Bad experiences on this front by ziggystarsky · · Score: 2

      The reported error rate is for conversational English. This means that you cannot throw meaningless words at it. Modern speech recognition exploits grammatical and semantical structure. The stock recognizers can't do this for programming languages. You could train the model on a programming language, and certain constructs (like brackets, if-then-else) will see an improvement in recognition.

    13. Re:Bad experiences on this front by skovnymfe · · Score: 1

      Do you have access to internal MS Research software? Cool, bro. Can you hook me up with some access too? Because you must've used the internal MS Research software to do your anecdotal testing some months ago, since you've got an opinion on how good it is at doing its job.

    14. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      This means that you cannot throw meaningless words at it

      This is precisely the reason why I stopped that development. It wasn't able to properly recognise a very important aspect of programming: random (variable) names.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    15. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      Cool, bro. Can you hook me up with some access too?

      No. LOL.

      Because you must've used the internal MS Research software to do your anecdotal testing some months ago, since you've got an opinion on how good it is at doing its job.

      As already explained to other poster with an equivalent (mis-)understanding, I was plainly sharing a relevant recent experience on this front to help people not too used to all this (e.g., myself 1 year ago) to get an idea about the current commercial reality (= way off 5%). You don't consider it relevant? Excellent! Ignore it. But, please, don't invent meanings or intentions which don't exist.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    16. Re:Bad experiences on this front by parkinglot777 · · Score: 1

      a) This is about real human launguages [sic], not programming languages b) You wern't [sic] using the speech recognition software that this is talking about

      I would agree with point "a" but I would agree with point "b" only under certain conditions.

      If a software is specifically for speech and has built-in functionalities in attempt to auto-correct words to fit a context sentence, then I agree. However, if it simply transcribes a speech, then it can be used to do anything and does not need to be related to just a conversation recognition.

    17. Re:Bad experiences on this front by SQLGuru · · Score: 1

      All of the Speech Recognition software that you commonly use is geared toward conversational language. You could create one that follows the language and grammar of code, but it would require different training. Consider the search suggestions you get when you type in the search bar.....that's how Speech Recognition works. Based on the previous words, it creates a list of likely next words and then determines which one matches the spoken words. When I type "void" into Google, it suggests to me the following:
      void
      void(0)
      void movie
      void definition

      None of those suggestions are "void function". And Google Suggestions aren't trained for more normal language like Speech Recognition would be because people are less likely to search using full sentences.

      What you are attempting to do is technologically possible, but you'd need to use the Speech API and create your own trainings (try this article and focus on the Grammer Building: http://www.c-sharpcorner.com/u...).

      Regardless, if you are using that particular Speed API, I don't think you are using the one in the article. I think the one the article is measuring is the one that would be found in Azure (https://azure.microsoft.com/en-us/services/cognitive-services/speech/).

    18. Re:Bad experiences on this front by GrumpySteen · · Score: 1

      With programming, you're likely going to want a LOT of special formatting that you can type without thinking but it's cumbersome to communicate via speech in a way that won't confuse a speech recognition engine.

      This story is about speech recognition being as good as transcription services. Programmers don't dictate their code verbally to be transcribed into text format by someone else, so that is a really weird thing to try to use as a counter argument.

    19. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1
      As explained to others, the problem I found cannot be solved via training: random variable names. Anything can be a variable name. I might not even have minded to spell it out, but that option wasn't available either (suggestion for those working on this front: why not including such an option?! It should be easy to implement and this kind of resources might be very helpful for weird-word scenarios or specific implementations like mine; or at least, a way to disable the language-dictionary searches and to focus on phonetic analysis).

      What you are attempting to do is technologically possible

      Sure it is possible, why would I have started to work on it otherwise? (I am an experienced programmer and, on top of everything, a sensible person with no interest in wasting my time for no reason). You (like others before too) are coming from a set of wrong assumptions regarding that specific development and my expectations: all what I wanted was a sub-system accurately recognising certain individual words, nothing else. I was the one meant to take care of all the context, different meanings, actions, etc. completely by my own. But the in-built speech recognition engine didn't fulfil that expectation.

      I don't think you are using the one in the article

      I worked on this around January/February and used the in-built speech-recognition engine of Windows/.NET. So, I was certainly not using what is being discussed in this article; and I haven't insinuated otherwise at any point.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    20. Re:Bad experiences on this front by Baron_Yam · · Score: 1

      >Programmers don't dictate their code verbally to be transcribed into text format by someone else, so that is a really weird thing to try to use as a counter argument

      Yet my post was in response to someone attempting to program by dictation, so somehow it seems completely relevant.

    21. Re:Bad experiences on this front by arth1 · · Score: 1

      I can't use voice recognition to send a text without 3-5 attempts. And I don't have a hard accent.

      I can't get voice controlled phone systems to work.
      The main problem is that I have a deep voice, and these systems are built on the pareto principle - cutting off the 20% with the deepest or highest voices is considered acceptable. I refuse to squeak to be understood.
      Some of the phone systems have hardcoded that if you say "human" or "operator", it will take you to a human operator. The problem is that it doesn't recognize those keywords either. After the aggressive high pass filter on the voice recognition systems, I probably sound like I'm speaking underwater.

      An additional problem is that some languages do not correspond well to English, which the systems were designed for. Languages that depend heavily on inflection or have a great disparity between written and oral language are heavily penalized.

    22. Re:Bad experiences on this front by SQLGuru · · Score: 1

      Did you build your own grammar? And variables needn't be a bunch of cryptic jibberish -- use meaningful variable names so your code is more readable; so real English words.

    23. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      Did you build your own grammar?

      I created that algorithm as a text parser/writer (+ communicating with the compiler) and relied on speech recognition only accessorily. Why spending time on letting the algorithm understand a context when it isn't required (+ would have increased the input complexity a lot)? This situation isn't equivalent to a conventional language full-text (with context, double-meaning, intention, etc.) understanding where the user has no participation, but to a mouse-pointer+keyboard emulator.

      After starting, the application was kept in the background waiting for my commands. All the commands consisted in simple words and short sentences telling the application exactly what to do without it needing to understand previous/next order. For example: "open project" (small pause) "1", "open file" (small pause) "1", "scope" (small pause) "global", "go line" (small pause) "123", "insert method" (small pause) "method1" (small pause) "arg1" (small pause) "arg2", etc.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    24. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      use meaningful variable names so your code is more readable

      I always write pretty meaningful English variable/method names in my code, but what is evident for a person is nonsense for a computer. For example, "variable1" (or worse, the more common "var1") is very difficult for a speech-recognition engine. And this was, as explained, precisely the main problem with that development and what made me stop it. Bear in mind that I was expecting this application to help me in my normal work, not to make it change the way in which I work (= redo all my variable-naming conventions).

      I don't want to be rude, but I think that I have already spent long enough time explaining you what, honestly, I think that was pretty much implicit in my original post (at least, for someone with a reasonable good understanding about all this, precisely the only kind of audience who should be here commenting anyway). So, I hope that I have solved all your doubts and you don't need to ask me anything else.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    25. Re: Bad experiences on this front by CustomSolvers2 · · Score: 1

      moving more and more towards cheating via rules like grammar because processing sound is too difficult

      This seems like an spot-on summary of what I am starting to see that is the situation in this sub-field. I was expecting tools able to recognise random words by focusing on the sound aspect though.

      And the problem is more difficult anyway. Try to dictate your program to another programmer: in my experience that never worked well.

      If you have a proper underlying data-structure/-understanding in place there shouldn't be any problem on this front; at least, not for me as far as I developed this mostly for my personal use. I had still to think about good ways to pass reasonably long entities (e.g., a function with many arguments of different types), but I was quite happy with its performance during my preliminary tests. If random words (= variable names) were recognised properly, I would have certainly completed a first version for me to use in my work.

      And to me "using simple words" sounded like a very strong arrogance red-flag, but whatever..

      You misunderstood it. It was just an aggressive resource to minimise the chances of starting a long set of misunderstandings to nowhere. Although perhaps I did misjudge the situation and relied on it without being required.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    26. Re:Bad experiences on this front by angel'o'sphere · · Score: 1

      Speech is 4 or 5 times slower than typing.
      So unless you can tell an IDE "look at package 'my.product.model' and 'my.product.entities', create a Factory based on ctor signatures for all 'entities' that implement interfaces from 'models' and return 'model' classes" voice input is pretty pointless. And I doubt an 'AI' will be able to do that soon, while my template based code generator does that instantly. But I start it with a mouse click (which is slower than a key board short cut, obviously).

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    27. Re:Bad experiences on this front by angel'o'sphere · · Score: 1

      Coding you are jumping up and down filling different parts of the problem
      Erm, if you meant me with "you", then err, no!!
      I just write my code top down.

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    28. Re:Bad experiences on this front by angel'o'sphere · · Score: 1

      The parent was just bad in natural language transcription into internal (mind) symbols and constructed a completely different meaning from your words than you intended.

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    29. Re:Bad experiences on this front by angel'o'sphere · · Score: 1

      You are obviously not a programmer.

      'invoiceAddress', 'invoiceName', or other artificial words are used in programming.

      The speech recognition would interpret it as 'invoice address' and 'invoice name', hence the program would be 'broken'.

      Other examples are abbreviations, like fis (FileInputStream) for a variable name or fos (FileOutputStream). However I would assume a speech recognition software would be able to understand eF Eye eS.

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    30. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      If you're still doing manual formatting/indentation in 2017, then you're doing it wrong, and you need to go back to school for a refresher. (Hint: Programs like indent have been around for decades.)

      If you take random words completely out of context and make up their meanings/my true intention, you are plainly having a discussion with yourself rather than with me. If you had read the remaining parts of that post you would have understood that I developed a small application to write code with my voice. For example, I say "insert void" (pause) "method1" and, in the corresponding file, it writes

      void method1()
      {
      }

      If I want to change the indentation (or the scope of the method is different and the default indentation too), I would use the corresponding command such that the application can perform the corresponding action (changing the indentation); FYI, that implementation is extremely straightforward (i.e., the difference between developing an application writing "{" or " {" is negligible). Do you see now the difference between using existing software to code (e.g., any IDE or editor since quite a few years ago) and developing a piece of software whose output is properly-formatted code; what my comment was about: the proper context which would have allowed you to have a conversation with me rather than with yourself?

      Seriously, what is the problem of people with extremely-poor-understanding capabilities and their obsession of arbitrarily wasting my time with their nonsense? Aren't there better ways to spend time online than bothering me with their ridiculous expectations and concerns? Or why aren't they making a tiny effort to properly understand (even from their past huge amount of errors) such that they can have a slightly relevant contribution to someone else other than arbitrarily wasting time?

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    31. Re:Bad experiences on this front by SQLGuru · · Score: 1

      Actually, I am a programmer. And it seems like he never bothered to give the speech engine the grammar of the programming language (the Backus-Naur form syntax). Programming languages are very prescriptive in where things go -- unlike English where some words can vary in their location.

      With the BNF defined, it should be easier to determine that "variable goes here" and the software could look for previously identified variable tokens to assist in the interpretation --- that's basically how IntelliSense works. So, you might need to spell it the first time, but subsequent times, it should be able to identify variables based on a token table.

      He mentions in a different post that he didn't bother with the grammar because it was just supposed to be command based (open project, save project type commands) --- which would have benefited from defining a grammar because that's EXACTLY WHAT IT'S FOR. Using the grammar to define a programming language is actually more complex, but certainly do-able. [In another post, he mentions he was trying to actually dictate code....so who knows what he was actually doing.]

      What it seems to me is that he was in over his head with a cool idea and an idea of how to start it. But speech recognition (even with the APIs available to us) is hard and still isn't as accurate as it needs to be to not be frustrating.

    32. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      [In another post, he mentions he was trying to actually dictate code....so who knows what he was actually doing.]

      What about reading the numerous explanations which I wrote for you (because you needed much more than all the other commentators here to understand extremely simple ideas) and/or asking me? I have already wasted a ton of time with you and you don't seem to get even the basic ideas right, I wouldn't have minded to answer some TO-THE-POINT questions (already addressed in my other comments but which you weren't able to understand). In the future, please, try to avoid talking about me (actually lying as far as you transmitted an impression which has nothing to do with the reality as defined in the other comments) when I am not present.

      Although you don't seem to be able to understand even simple concepts (I wrote a step by step example of what the application is doing and you are still not getting it?!), here it goes my last attempt at trying to go through that thick layer of poor understanding that seems to be around you, such that you can hopefully help 1 SINGLE IDEA RIGHT. Forget about grammar and all the rules which you apply to speech recognition in other contexts; forget about all what involves the syntax of a programming language; just assume English grammar by default and focus on answering this single question: how are you expecting me to create a set of grammar rules (or improvements on the default ones) allowing my application to recognise the extremely variable reality of variable names, as defined by examples like: "var1", "functionNew", "varString", "parseStringFromNumber"; that is: names with evident-for-people English meaning, but not expressly recognised in any dictionary. I want all of them to be recognised as single words, not as sentences ("var1" as "var" & "1" is wrong). Please, illustrate me and everyone else here: how can I use the in-built .NET speech-recognition engine to understand this kind of terms?

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    33. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      Sorry but I cannot believe that anyone (much less someone calling him/herself a programmer!!) isn't able to get a so simple idea! So here comes an additional explanation just in case: this wasn't about understanding a whole code as a conventional text, but about using speech-recognition (= me talking by using very simple and easy words) to trigger different actions (e.g., pointer moved to line X, Y text pasted in that part, bit Z removed, etc.). The only person thinking about creating a whole grammar to understand a programming language (?!) was you. I was only interested in having a reliable sub-system always understanding my instructions.

      The ONLY PROBLEM which I found with this approach (other than that it was working perfectly) was that variable names aren't recognised because of not being valid (English) words (-> that's why I said that I wouldn't have even minded to spell them out; just the variable names which weren't recognised otherwise!!). If you want to help/criticise, please, focus on the actual problem rather than on your imagination: in-built speech recognition engine not being able to account for variable names (= invalid but similar to valid English words merged together).

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    34. Re:Bad experiences on this front by CustomSolvers2 · · Score: 1

      Yep, I understood that, and that's exactly what I was criticizing you for doing. Yes, "some people" (hint: you) do have very poor understanding capabilities.

      So, you are basically repeating what I said by inverting the targets (me rather than you). Synchronising words and reality isn't to important, right? In your mind, you have won this conversation, because this is all what it is about, isn't it? Going around saying random words (to anyone for any reason; the whole world is here waiting for you to come in for whatever reason and say whatever you feel like saying), with random meanings and randomly deciding what has been the output? Fascinating! LOL.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    35. Re: Bad experiences on this front by CustomSolvers2 · · Score: 1

      Thanks for the tip. I might test it next time (although relying on native .NET libraries is certainly an advantage when programming in .NET). In any case, it seems that my real problem might be almost-unsolvable on account of current speech-recognition approaches. The accent was problematic, but the biggest deal was the impossibility of recognising variable names (e.g., "var1", "function2", etc. all of them expected to be recognised as single words). Apparently, most of current speech-recognition approaches focus on existing words on the given language and try to match everything to them (in the best scenarios, "var1" recognised as "var" + "1"; and "function2" recognised as "function" and "2"). An alternative enabling the option of supporting pure phonetic (or spelling) would be excellent (I might even try to do something by my own). Anyway, I will see whenever I will decide to give a new shot at all this, what is very unlikely to happen before next year.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
  5. "As Accurate As Professional Transcribers" by Anonymous Coward · · Score: 5, Funny

    "As Accurate As Professional Transcribers..."

    They left out "from Uzbekistan transcribing Navajo - underwater".

    Never trust anything Clippy say.

    1. Re:"As Accurate As Professional Transcribers" by skids · · Score: 1

      They left out "from Uzbekistan transcribing Navajo - underwater".

      and "...working on cell phones with auto-correct enabled"

  6. Curious what the results are on modern hardware by DraconPern · · Score: 1

    They should do tests using modern hardware. For example the speech recognition on iOS seems to be pretty good. If they can get this technology into windows 10 that would be awesome. Oh I dictated this using iOS.

    1. Re:Curious what the results are on modern hardware by religionofpeas · · Score: 1

      You talk in a typewriter font ?

    2. Re:Curious what the results are on modern hardware by qbast · · Score: 1

      I tried the same paragraph on iPad - flawless recognition. Then on Windows 10 - this is the result: "He shouldn't have been tested in what example dispute recall he can get distinctive into windows and that would be on a dichotic and"

  7. NSA by Dan+East · · Score: 1

    The NSA would love this. Keyword scanning of 95% of what's spoken in phone conversations (given enough processing power to transcribe them all).

    --
    Better known as 318230.
  8. Yeah by Dunbal · · Score: 1

    Just make sure you run it on an air gapped computer if you want your conversation to remain private.

    --
    Seven puppies were harmed during the making of this post.
  9. 5% error rate is acceptable? really? by Anonymous Coward · · Score: 1

    I worked as a professional transcriber in the legal profession, actually employed by the government. 95% accuracy would be 1 mistake in 20 words, an error almost every 2 lines. For the standard we had to type to, an error every 2 pages would be unacceptable. These transcripts are admissible evidence in court as an exception to hearsay rules and people's lives hang on the accuracy of them. The transcripts themselves are also literally the law of the land (I live in a common law jurisdiction, so my transcript is literally legally binding law and a printout of my transcript is admissible for that purpose as well). Imagine a 5% error rate in that.

    Also, judges always speak "The Queen's English". How is this algorithm going to translate what they really say into proper language suitable for a judicial order? I'd also love to see how it deals with technical Jargon; for example citations that are spoken all sorts of haphazard ways yet must be typed in a specific format.

    And this doesn't even factor in the thick accents many people use that are almost unrecognizable by the best humans, how is a computer going to deal with that?

  10. Comically inaccurate by Larry_Dillon · · Score: 1

    At work we have an cloud-based Outlook that transcribes voicemail to text. It's so comically inaccurate that we sometimes forward the results to the sender and we both get a good laugh.

    --
    Competition Good, Monopoly Bad.
  11. Microsoft Speech Recognition Now As Accurate - Say by WeBMartians · · Score: 3, Interesting

    If it can recognize "It's difficult to wreck a nice beach", I'll be thoroughly 'whelmed'.

  12. Re:Perfect by lobiusmoop · · Score: 1

    That's from over 10 years ago, which in computing terms is ancient history.

    --
    "I bless every day that I continue to live, for every day is pure profit."
  13. "Show me to buy milk at this opportunity."anyone? by itsme1234 · · Score: 1

    The lameness filter is lame.

  14. How does it do with... by judoguy · · Score: 1

    IgPay AtinLay?

    --
    Peace is easy to achieve, just surrender. Liberty is much harder get/keep.
    1. Re:How does it do with... by PoopJuggler · · Score: 1

      "Dear aunt, let's set so double the killer delete select all."

  15. Re:and how much by Opportunist · · Score: 1

    We have arrived at the point where assuming that a company wants to invade your privacy is pretty much the default position.

    --
    We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
  16. In which environment? by Opportunist · · Score: 2

    In a sound proof studio built for sound recording spoken by someone with speech training?

    Or in an environment with 30 people talking in the background, an air condition running, doors and drawers slamming, people laughing, feet
    and chairs shuffling across the floor, some photocopiers that got their last service before Bush left office whining for hours and a person speaking into the phone while at the same time talking to coworkers and you're expected to know which words belong to you and which ones are directed at someone else?

    Aka "open plan office".

    --
    We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
    1. Re:In which environment? by walterhpdx · · Score: 1

      THIS! Because airlines that use voice recognition technology deserve a special place in hell. Trying to go through an airline's automated system in a busy, *noisy* airport is nigh on impossible. You'd think they would have thought that through.

  17. Microsoft Speech Recognition Now As Accurate by Anonymous Coward · · Score: 1

    Microsoft Speech Recognition Now As Accurate As Professional Transcribers who are deaf and whose native language is Esperanto.

  18. Re:WAY TO MISS THE BLOODY POINT! by jabuzz · · Score: 1

    Given that transcription is not a highly paid area, and that a moderate typist can transcribe pretty much as fast as as you talk, there is not a chance in hell you can fire 10 transcribers and hire two.

    However this is 2017, there is no need to have your transcription service in central London for example. Punt the audio file to somewhere else over the internet. It doesn't need to even leave the UK to be much cheaper than being in central London either.

    In fact this is perfect for homeworking to be honest. Especially given the pay rates and demographic profile of most transcriptionists. That is the job is not exactly high pay, most of them are female and a high number give up work as childcare costs are too much once they have children. Take the commute out the equation and bingo pool of skilled workers ready and waiting. Bit of flexible working to do the school run and jobs a good one.

    The only specialist gear you need is a set of foot pedals and they cost under 100GBP for a USB set from the likes of Philips or Olympus. A full kit including software and headphones and pedals is under 200GBP.

  19. On the down side by fahrbot-bot · · Score: 2

    It still showed up at the South Park "Save Films from their Directors" club for the wrong reason when it heard, "Free Hat".

    (For those that aren't South Park followers...)

    Cartman writes "Free Hat" on the advertising poster in the belief that freebies are necessary to attract people. However, the crowd mistakenly thinks the rally is to free Hat McCullough, a convicted baby killer they believe was innocent.

    Now thinking that "Free Hat" would be a great name of one of those Windows App Store pirate streaming apps ...

    --
    It must have been something you assimilated. . . .
  20. Previous experience by Anonymous Coward · · Score: 1

    I was pleasantly surprised by the voice-message to email service my last employer had with Google.

    They sent you the voice message in an attachment with the translation in the email. If the translation didn't make sense, you could play the audio yourself.

    Only annoying thing was we still had to delete the VM off the phone manually afterward.

  21. The acid test by John+Jorsett · · Score: 1

    Will it transcribe, "Diffused the situation," or "Defused the situation"? Every single TV closed-caption I've ever seen, and I've taken special note since I first became aware of this, has gone with the former. And those presumably have been humans making that error.

  22. Hype, more hype, and maybe outright lies by Rick+Schumann · · Score: 2

    If you believe Microsoft without independent verification from an otherwise uninterested third-party who has no investment in the outcome, then you're a fool.

  23. 5% by MMC+Monster · · Score: 2

    One in 20 words is wrong?

    How can a human transcriptionist be that bad?

    --
    Help! I'm a slashdot refugee.
    1. Re:5% by gweihir · · Score: 1

      It is not. Sure, humans get a word wrong, but they will only very rarely mangle the meaning. Machine transcription, on the other hand, will often get meaning wrong and that is a serious problem.

      The only thing this shows is use of an unsuitable (in fact, utterly stupid) metric for marketing purposes.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  24. Re:WAY TO MISS THE BLOODY POINT! by Anonymous Coward · · Score: 1

    I don't know about other kinds of transcription, but Court transcription is very highly paid and I believe so is medical transcription. For civil and family court, I get on the order of $8-10/page (at 32 lines per page) and I can type 20-25 pages per hour. Plus a hefty expediting fee if they can't wait 2-4 weeks. Plus I get paid for my time in court. I had a co-worker who had a part time job doing movie closed captioning, but that paid a lot less than our day job.

    Your condescending attitude aside, this job requires making the recordings in court personally (part of the legislation that exempts our transcripts from hearsay laws -- how can I certify that this is what was really said in court if I wasn't there personally to hear it?). It is not a part time job for single moms to pick up a few extra bucks. And since courts often run late, parents have a huge problem doing this job at all -- how can you pick up your kid at 4 every day when at least once a month you're staying until 6?

    I make more than a lot of the lawyer do, at least the legal aid ones. I also have logged far more courtroom hours than most lawyers twice my age and could do their job a lot better than they could if I had a law degree. I know most of the seminal cases better than them, I've typed many reported decisions which I will obviously know better than anyone who read it, and I have a large library of unreported decisions which are still legally binding.

    But you go on thinking it's single moms pecking away making a few bucks an hour. Notice I'm on slashdot, I have a computer science undergraduate degree from a top tier school, but I make a hell of a lot more at this job 50 hours a week than I ever did slaving away programming 80 hours a week -- at a job that lasted 2 years before the company I worked for went bust.

    Oh, and on top of that, I get a defined benefits government pension. All this and I'll retire before I'm 60 with a government pension higher than most CS people make in their peak earning years.

    It's been possible to offshore to India for years now. At least one company divides the audio into 5 minute chunks and spreads it out to a large typing pool, so I can have a 5 hour audio file transcribed and returned to me in 30 minutes and at an amazingly low cost. The problem is the quality is so low the service is useless. They also don't format it

    And fwiw, the software is free; the recording software is very expensive, but that's only on the equipment in the courtrooms. The software to play back their proprietary files is free. We do have a hefty annual fee to a professional standards organization though.

  25. Nonsense by gweihir · · Score: 1

    Humans transcribers "have the advantage to be able to listen to the recording several times"? What utterly demented nonsense is that? Of course, the software, having the recording, can "listen" to it as often as it wants. There is absolutely no "advantage" here for the human transcribers.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    1. Re:Nonsense by JesseMcDonald · · Score: 1

      Of course, the software, having the recording, can "listen" to it as often as it wants. There is absolutely no "advantage" here for the human transcribers.

      The advantage is that the humans are being given much more time to process the recording. While the human transcriptionists are reviewing the recording multiple times, in real-time, the speech recognition software is producing immediate results.

      --
      "The state is that great fiction by which everyone tries to live at the expense of everyone else." - Bastiat
  26. Tad misleading by Oligonicella · · Score: 1

    puts its accuracy on par with professional human transcribers who have advantages like the ability to listen to text several times

    As if the audio sails by the program and isn't stored in memory and parsed as many times as needed.

  27. Works form me by mnemotronic · · Score: 1

    I fuse micro sot noise recognition ball the time it words fall Leslie.

    --
    The Russians have won. They have made the world a cesspool of distrust, greed, fear and hate.
  28. Re:WAY TO MISS THE BLOODY POINT! by jabuzz · · Score: 1

    Because court transcribers are less than 0.1% of people doing transcription, that's why. No idea what it's like in the USA, but in the UK the NHS does not pay at that level for medical transcription services, and top law firms don't either. I very much doubt the court transcribers get paid that much either. I will however ask my brother (aka a real life Judge) what they get paid next week when I see him. However a quick google suggests 60GBP per a 5.5 hour day sitting after which overtime kicks in but rarely more than 7 hours which seems about right to me. I tell you now gets that late and your adjourn. I imagine those doing Hansard (that's Parliament's transcription service) get paid a lot more though.

    So in the UK someone doing transcription is going to be earning in the region of 15k-20 GBP outside London, and more inside.

    My suggestion was not to punt it to India, but instead of doing in central London, have it done in say Newcastle or Liverpool where as property prices are not insane like they are in London wages are lower.

    This was all possible back in 2000. Between my brother and I we had it all worked out, business plans and everything, then the dotcom bubble burst. Oh and I am not talking single mothers either. Back in 2000 my brother worked at a large UK law firm and it was a problem that once they had kids and 99% of those doing the transcription where women, the cost of childcare made it uneconomic to return to work. Had a number of mothers lined up and eager to do the work.

    Oh and most transcription is done from a dictaphone dude. Your court transcription is such a tiny tiny fraction of the market that it's not worth talking about really so get of your high horse.

  29. ROTFLMAO!!! by whitroth · · Score: 1

    It is? And who decided *that*?

    We've got it on our hybrid phones. At least half the time, the voice transcription "preview" resembles, randomly, Vogon poetry, or perhaps only "computer poetry" from 40 years ago. It rarely gets a name or title correct, and the message they're trying to leave, *maybe* 50% is close enough to guess what they meant, without listening to the mp3.