Slashdot Mirror


Open Source Transcription Software?

sshirley writes "I am beginning to do some interviews with family members and will do some audio journals for genealogy purposes. I would really love to be able to run the resulting MP3 or WAV files through some software a get a text file out. I know that software like this exists commercially. But does this exist in the open source world?"

221 comments

  1. CMU Sphinx by Singularity42 · · Score: 3, Informative

    Looks active.

    1. Re:CMU Sphinx by Narksos · · Score: 5, Informative

      What you want is dictation software. I just (last week) spent significant time looking in to this.

      For open source you have two main options: CMU Sphinx and Julius/Julian. Both options are just back-ends, you'll have to write a front-end. However, it shouldn't be too hard to do that (the source for the CMU Sphinx demos show how to get input from a mic/wav file (if you've got something other than PCM you'll just need to convert it) and set up various engines.

      CMU Sphinx appears to be mainly for research purposes. You can run it in a few different modes: one with a fixed grammar (for command systems, Gnome's voice control uses sphinx in this mode), one (what you'd be looking for) uses a weighted dictionary. I didn't train it to my voice (and you wont be able to train it for transcriptions) and I was getting fairly lousy recognition rates with my $20 Logitech USB Microphone. It might work better with a high quality headset, but I imagine you wont both be wearing one.

      Julius/Julian lacks a good acoustic model for English. VoxForge is working on one, but it isn't anywhere near complete.

      Here is a good article that sums up the current projects

    2. Re:CMU Sphinx by Anonymous Coward · · Score: 2, Informative

      Sphinx is what many companies use to get started with, but it's far too raw to be useful by itself. You need to update the HMM back-end extensively... and train it. Even still, your success rate is only 80%... meaning: 1 in 5 words, if spoken slowly, will still be wrong.

    3. Re:CMU Sphinx by notthepainter · · Score: 3, Insightful

      Both options are just back-ends, you'll have to write a front-end. However, it shouldn't be too hard to do that

      Actually, it can be rather hard to do that. I was one of the founders of MacSpeech and there is a surprisingly large set of details you have to deal with, punctuation, capitalization, etc... Of course since you wouldn't be making a commercial product much of the gloss need not be coded but once you have the engine, the part that takes the audio source and converts it to text, you still have a large amount of work left over.

    4. Re:CMU Sphinx by inkyblue2 · · Score: 3, Interesting

      Sphinx by itself is a terrible answer to this problem, unfortunately. The code is free, but good luck finding an appropriate model. Worse, you'll need to train a speaker-dependent model to get any usable results, and this is a VERY non-trivial task with Sphinx tools in the state that they are. I spent several years getting paid to adapt Sphinx for commercial purposes and while it's great for some things, I can say with confidence that it is not the tool you're looking for.

      You know what works? Dragon. Hate to say it, but the commercial products here have a gigantic edge on the competition.

      That said, I'd love to see someone come up with an open source speaker-dependent model training system that's friendly enough for app developers (not speech researchers) to roll into projects. I think this is a big open door for contribution to the community. Sphinx isn't the best thing going, but it's certainly usable, and if a real product came into being I'm sure all the speech wonks would start coming out of the woodwork to improve the algorithms.

    5. Re:CMU Sphinx by Bacon+Bits · · Score: 5, Informative

      Both options are just back-ends, you'll have to write a front-end. However, it shouldn't be too hard to do that

      "... unless you're not a programmer."

      Seriously, since when is "program it yourself" a solution to "are there any open source software packages that do what I want?"? The answer you're looking for is "no". That's the correct answer.

      Here, here's a nice car analogy since we're on Slashdot: when you need a car do you buy a kit car, or do you buy one factory built? This is like telling someone who wants a car to drive to work that they should simply buy Chevy big block engine and build the rest from scratch. Just because I need a car doesn't mean I must be an automotive engineer and metal fabricator. Similarly, just because I need dictation software doesn't make me a software architect or a linguist. Directing this person to program their own software is not answering the question.

      Cripes. People wonder where the "open source is only free if your time has no value" line came from.

      --
      The road to tyranny has always been paved with claims of necessity.
    6. Re:CMU Sphinx by slim · · Score: 1

      Both options are just back-ends, you'll have to write a front-end. However, it shouldn't be too hard to do that

      "... unless you're not a programmer."

      In any discussion about Open Source, it's appropriate to mentally substitute the verb "program" with the phrase "program or pay someone to program".

      Economically, the two are equivalent.

      OSS doesn't just give you the freedom to hack at code yourself. It also gives you the freedom to hack at it by proxy.

    7. Re:CMU Sphinx by SwedishPenguin · · Score: 1

      If that question was posed on a website with a lot of people interested in cars, and interested in building cars, it's a perfectly valid response, just as the grandparent says one can build a front-end relatively easily on a website with a lot of people interested in programming. If the question was posed on a genealogy website, it obviously would not have been an appropriate response, but this is Slashdot...

    8. Re:CMU Sphinx by Tsu+Dho+Nimh · · Score: 1

      For open source you have two main options: CMU Sphinx and Julius/Julian. Both options are just back-ends, you'll have to write a front-end.

      So, the answer is "no". There is no OS software that is ready for the OP to install and use.

    9. Re:CMU Sphinx by murdocj · · Score: 1

      In any discussion about Open Source, it's appropriate to mentally substitute the verb "program" with the phrase "program or pay someone to program".

      In any discussion of software, it's appropriate to mentally substitute the verb "program" with the phrase "program or pay someone to program". If you're willing to pay, you can always get what you want. "Open Source" has no bearing on that.

      What's interesting here is that people talk a lot about how Open Source == freedom, not "free as in beer". But I'd be willing to be most posters asking about "Open Source" solutions to problems are more concerned with the "no cost" than they are with "I can modify the source".

    10. Re:CMU Sphinx by Anonymous Coward · · Score: 0

      Your car analogy is misguided. Many "car" guys would much rather acquire a car that needs work because they can spend there spare time working on it, which they consider fun. If they also happen to save money that is a added bonus. The same applies to many programmers, they enjoy engaging in something challenging and different. He didn't specify that he wanted a COMPLETE package he was just wondering if there was a open source solution. For all you know this guy is a programmer by trade and maybe he wants a different more challenging project to take on. That is what makes open source so great, you can join in on these projects and help develop a complete package. So the answer is yes, there is a solution but it is not hand fed to the guy.

    11. Re:CMU Sphinx by marcello_dl · · Score: 1

      The right suggestion then ends with "bay".
      Look for used software on ebay.
      (got ya huh?)

      --
      ---- MISSING MISCELLANEOUS DATA SEGMENT --- [sigdash] trolololol
    12. Re:CMU Sphinx by orasio · · Score: 1

      Seriously, since when is "program it yourself" a solution to "are there any open source software packages that do what I want?"? The answer you're looking for is "no". That's the correct answer.

      Your answer is the response to "I want a ready to use dictation software, and I don't want to pay for it".
      If you find an open source backend, and payment is not your sole concern, you can get someone, maybe even the authors, to code the parts you need. You might even do it yourself, of course.

      The thing is that _you_ assumed that "open source" meant "free as in beer". Some of us didn't. For those of us, it was a good response.

    13. Re:CMU Sphinx by Anonymous Coward · · Score: 0

      Slashdotters who aren't programmers? Heathen savages!! "Directing this person to program their own software is not answering the question." - except that the hard part (the linguistics engine) is already done. Seriously, fun little rainy day project, and in a half-hour you've got a little front-end built.

      Seriously, since when is "program it yourself" a solution to "are there any open source software packages that do what I want?"? The answer you're looking for is "no". That's the correct answer.

      Let's think about this - you're surfing around on Slashdot and you ask a question. What are the odds that the answer will require zero lines of code from you? And if that makes you cry home to mommy then why the hell are you here to begin with?

    14. Re:CMU Sphinx by tehcyder · · Score: 1

      For open source you have two main options: CMU Sphinx and Julius/Julian. Both options are just back-ends, you'll have to write a front-end.

      That answer is factually correct, whilst being entirely unhelpful. You must be an actuary in real life.

      --
      To have a right to do a thing is not at all the same as to be right in doing it
    15. Re:CMU Sphinx by slim · · Score: 1

      What's interesting here is that people talk a lot about how Open Source == freedom, not "free as in beer". But I'd be willing to be most posters asking about "Open Source" solutions to problems are more concerned with the "no cost" than they are with "I can modify the source".

      Which is why I like to subtly correct them :)

    16. Re:CMU Sphinx by Crudely_Indecent · · Score: 4, Insightful

      "... unless you're not a programmer."

      I am a programmer, but we're all sometimes out of our element.

      I found need for modifications to an open source application a few years ago. Rather than spend my time reading the source code to understand how the application worked, I decided to contact the developer. A few emails and a couple of days later, the project developer made the modifications for me and $500 for himself. The world then gained additional functionality in the open source application - everyone wins.

      Some people forget, this is how many open source applications survive.

      Your analogy is outlandish! If someone wants to drive a car to work, they buy a car. If they want a shark fin on the roof, they go to a custom body shop. If they want a killer stereo, they go to a stereo shop. If they want it to be pink and yellow like yours, they go to a paint and body shop. If they can do these things on their own, they'll do it. The difference being that if the car was open source, doing these things wouldn't void the warranty.

      "Open-source is free only if your time has no value." - Jamie Zawinski

      I offer an alternative viewpoint:

      Open source is free if you truly understand freedom.

      I'm free to use the application. I'm free to modify it. I'm also free to recognize my limitations and pay someone else to do these things for me.

      --


      "Lame" - Galaxar
    17. Re:CMU Sphinx by Anonymous Coward · · Score: 0

      obligatory: you must be new here.

    18. Re:CMU Sphinx by jne_oioioi · · Score: 1

      Similarly, just because I need dictation software doesn't make me a software architect or a linguist.

      but if you were a cunning linguist you could pick up the needed programming skills to do what you need.

    19. Re:CMU Sphinx by Bill,+Shooter+of+Bul · · Score: 2, Insightful

      Blechkt. That's how I feel about your post. This is a site for nerds. Nerds are often adept at doing nerdy things. Like writing software.

      Now, if you're mom asked you. Then yes, a reply of "You only need to write a front end to this speech engine" is indeed inappropriate.

      Your post, and the replies to it, really reflect more on how you view the general slashdot audience, then anything else.

      --
      Well.. maybe. Or Maybe not. But Definitely not sort of.
    20. Re:CMU Sphinx by BitZtream · · Score: 1

      Shame you spent $500, since I'm betting you could have bought a commercial app with the feature already in it and more for less.

      I understand you're supporting OSS, but sometimes thats just stupid and a silly way to try and stick it to the man.

      I'm sure you'll now tell me how no other app would do it or that its well worth it to have the OSS app, that you'll have to spend another $500 to maybe get another feature you want.

      Pretty much anyone will add you custom features (Microsoft included) if you make it worth their effort, this isn't exactly impressive, you just over paid in exchange for having the source ... which you've already learned you don't want to modify. I'm sure everyone else appreciates the feature ... maybe, probably not since it probably would have been added already if it was a popular request.

      You are most certainly free to make silly decisions.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    21. Re:CMU Sphinx by Crudely_Indecent · · Score: 2, Insightful

      The commercial app does exist, and it's a per-use app that is controlled by a dongle and subscription (hint, more than $500 - plus usage).

      Sticking it to the man has nothing to do with it, unless by "it" you mean money and by "the man" you mean my pocket.

      Of course, any commercial developer will gladly make a custom app for $, but I guarantee that it will be more than $500. The developer did have plans to add the functionality...eventually. My $500 bought made it happen right now.

      It was certainly silly of me to make over $50k using the newly modified software that I paid $500 for. That's only 9900% profit, so, you're absolutely right....I made a serious mistake.

      --


      "Lame" - Galaxar
  2. Dear aunt, by Anonymous Coward · · Score: 5, Insightful

    let's set so double the killer delete select all.

    Seriously, transcribe it manually... automatic speech recognition just doesn't work. And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.

    1. Re:Dear aunt, by Kenoli · · Score: 4, Insightful

      A program capable of "making informed guesses based on context" seems perfectly plausible, though that's not part of speech recognition per se.

    2. Re:Dear aunt, by ooloogi · · Score: 1

      Meanwhile there is commercial software available that runs on a commercial operating system that does a pretty good job of it, using a whole lot of computing power to make the required informed guesses.

    3. Re:Dear aunt, by icebraining · · Score: 1

      And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.

      Never? Not only it's possible, as there are already some papers on prototypes of grammar-switching context-based speech recognition engines.

    4. Re:Dear aunt, by Anonymous Coward · · Score: 0

      Google does a decent job at it in their YouTube and Voice products.

      Nothing open-source though. Hell, even the open-source voice recognition stuff that you have to train sucks donkey balls and that's a lot easier than free-form transcription.

    5. Re:Dear aunt, by Anonymous Coward · · Score: 0

      "dear aunt" explained:
      http://www.youtube.com/watch?v=tLa3Wac4O2A#t=25
      (sorry about video quality, back then youtube never looked better than your DVB-channels)

    6. Re:Dear aunt, by Anonymous Coward · · Score: 0

      yet

    7. Re:Dear aunt, by ThomConspicuous · · Score: 2, Insightful

      It's already being done in medical dictations that are also recorded and double checked by Transcriptionists. Speeds up work flow immensely even with the human verification in place.

      I even witnessed an East Indian doctor with a heavy accent dictate normally and have the software pick up everything stated. He was pleasantly surprised.

      It works.

    8. Re:Dear aunt, by fuzzyfuzzyfungus · · Score: 2, Informative

      Unless things have improved substantially since Dragon NaturallySpeaking 10, I'd be more inclined to describe the performance as "surprisingly adequate job of it, with training, and offers a vaguely cellphone-esque interface for choosing the correct word when it fucks up".

      It isn't comedically awful; and it likely beats typing with your stumps, or your eyelids, or whatever; but "pretty good" is being very generous.

      (Again, unless things have improved markedly since then) the software works best when used interactively, which allows it to suggest corrections, and you to make them, in real time. It also helps if it has been trained to your voice beforehand. The results of using it non-interactively, on a recording of somebody that it hasn't been trained for, will produce results error-filled enough that you might actually find manual transcription faster than manual editing(or, if you don't mind your family sounding like they've suffered head trauma or exposure to Dadaism, you can just store the recordings, make do with the text, and re-run the process in the future, when the software is better).

    9. Re:Dear aunt, by conchubhair · · Score: 3, Insightful

      The problem you are describing (continuous speech recognition) is not solved yet. Even the best state of the art technology is not going to be perfect, and having two speakers will make it even less useful. If you really need the stuff transcribed, you can pay for online services to transcribe it (if they offer really good quality transcription, they are most likely using humans) or you can transcribe it yourself (you can buy software to help speed up the transcription process - including a foot pedal to pause/play the audio, e.g. http://www.nch.com.au/scribe/). My company does a lot of work in speech recognition, and we have tried most of the companies that offer transcription. Some of them even provide APIs so you can code something up. The best fully automatic, commercially available transcription I have seen is from Yap Inc. (http://yapme.com/). If the speaker doesn't have a crazy accent and speaks at a normal level and pace you can get great results, but like all fully automatic transcriptions it can get it wrong. The benefit of Yap is that you can get back the confidence scores and alternates for each word, so if you had a dictionary of your own commonly used words you can pick out a better transcription. You pay by the word for transcription (it is a small amount, but it will add up if you're doing hours of audio). If you're willing to wait, the technology is improving all the time, so you could archive the audio for now and return to have it transcribed in a few years. If you need this done now and want something you can actually read then your cheapest option is to do it yourself, and maybe invest in some software to speed it all up. Unless you have a lot of time on your hand and access to a lot of transcribed audio to build the language models, using any software at home is not worth your while.

    10. Re:Dear aunt, by morgan_greywolf · · Score: 1

      Google does a decent job at it in their YouTube and Voice products.

      They also do a decent job on Android, which is open source. Zero training required. I wonder how easy it would be to rip the voice recognition out of Android source?

    11. Re:Dear aunt, by hawguy · · Score: 1
      I wouldn't say that Google Voice does a decent job, most of the voice mails left for me on Google Voice come out like this:

      Hey when you get this is a I'm via out what you know what it's Johnathon of the bad idea. So maybe we can meet that I don't know about that. And I do that, but what I thought I had, but that's also the Damon now, but I thought I had bought the house. I had to get out of the night okay and give me a call. bye bye.

      Except for the "bye bye" at the end, none of it is close enough to the actual message to be useful. The actual message said nothing about Johnathan, any meeting, Damon, or a house.

    12. Re:Dear aunt, by Anonymous Coward · · Score: 0

      Tell that to all of the companies worldwide who have no problems with their automated voice recognition systems.

      Just because you saw some video of a prerelease version of Vista's voice recognition fucking up doesn't mean that it can't be done. Many commercial solutions handle it well and even Vista's voice recognition works pretty damned well at this time.

    13. Re:Dear aunt, by xSauronx · · Score: 1

      i say set up google voice, dial the gv number and do your questions as voicemail into a speakerphone

      youll get a transcription in your email. do a question at a time. problem solved!

      --
      By and large, language is a tool for concealing the truth. -- George Carlin
    14. Re:Dear aunt, by Anonymous Coward · · Score: 0

      So if it can be trained reasonably well to your own voice, why can't you listen to the recording via headphones and parrot the words in your own voice? It may be faster than typing, even with the occasional correction.

      If others in your family are interested in helping with your project, this might be a good way to get them involved, too.

    15. Re:Dear aunt, by theheadlessrabbit · · Score: 2, Insightful

      let's set so double the killer delete select all.

      Seriously, transcribe it manually... automatic speech recognition just doesn't work. And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.

      ...a computer program cannot do yet

      --
      -I only code in BASIC.-
    16. Re:Dear aunt, by painandgreed · · Score: 4, Informative

      Seriously, transcribe it manually... automatic speech recognition just doesn't work. And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.

      Funny, considering my job is training doctors to use voice recognition to do all their reporting. Actually, it works fairly well. I also don't mean dictating something that goes to transcriptionists. The doctors dictate the report. The dictation is transcribed into text. They review it and sign off. We got rid of all our transcriptionists years ago. The time for a report to get done went from 24 hours with transcriptionists to 24 minutes with voice recognition. The amount of errors was cut in half. The doctor's work load was also lessened as they could check the final version while still dealing with the data rather than having to go back and review everything all over again a day or two later. Speech recognition was a problem seven years ago, but hardly at all in the last five or so. Yes, the have to go over their dictations and occasionally make some minor corrections. There's always background noise to worry about and some people's accents are hard even for another person to get through, but for things that require quick turn around and need to be verified by the person who is doing it, voice recognition already is the gold standard.

      PS several of the doctors like it so well they bought Dragon (pretty much everybody but Phillips use Dragon for their speech engine) for home and use it there for all their email and other writing.

    17. Re:Dear aunt, by Anonymous Coward · · Score: 0

      a computer program cannot do? please, you sound like a philosopher or worse yet, an arts student.

    18. Re:Dear aunt, by kagaku · · Score: 1

      When you consider the quality of audio input it receives I think it does a fairly decent job.

      --
      everyday is another shooter.
    19. Re:Dear aunt, by cgenman · · Score: 1

      There is a free iPhone dragon client, which sends the audio back to their servers for processing. There isn't any training, but there probably wouldn't be training on a family member's old tapes either.

      It's possible that Dragon might work for their needs, or at least be much easier to get equally bad data back as other solutions. Try the iPhone client and see.

    20. Re:Dear aunt, by Bluesman · · Score: 2, Informative

      by making informed guesses based on context, which a computer program cannot do.

      The Perl interpreter can.

      --
      If moderation could change anything, it would be illegal.
    21. Re:Dear aunt, by rgmoore · · Score: 1

      And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.

      It's also worth pointing out that in practice people frequently fail to understand each other perfectly. In conversations, we routinely ask questions to ensure that we've correctly understood what the other person said. If you ever watch a TV newscast with the closed captions on, you can see that the people producing those captions routinely make glaring mistakes. High quality human produced transcripts can only be produced by double and triple checking the transcripts against the source recording to make sure they're correct. There's no reason to expect computer generated transcripts to be perfect, either.

      --

      There's no point in questioning authority if you aren't going to listen to the answers.

    22. Re:Dear aunt, by Anonymous Coward · · Score: 0

      Google knows what im going to search for before i do!

    23. Re:Dear aunt, by Flyerman · · Score: 2, Informative

      The parent's link is exactly what I set up on a client's machine. They purchased the headset and pedals but the software itself was free and worked wonderfully.

    24. Re:Dear aunt, by Anonymous Coward · · Score: 1, Insightful

      you're refering software some one is -trained- to use for a specific purpose. that's not the same as a general purpose voice recognition program

    25. Re:Dear aunt, by BitZtream · · Score: 4, Interesting

      Ironically, I have a family member he runs a business doing transcription for doctors ... because every time the try voice recognition software they get pissed off and go back to real people.

      Being a fan of Dragon Dictate myself, I know its not that great and I know it has a fit when you start throwing accents at it, training or not.

      I call bullshit on your claims of using Dragon for everything.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    26. Re:Dear aunt, by ooshna · · Score: 1

      Are you in Soviet Russia where Google searches you? Sorry had to do it.

    27. Re:Dear aunt, by binarybum · · Score: 2, Interesting

      wow, shame on the anonymous troll that posted this and the moderators that must have been teleported from the early 90s. The high-end transcription packages are truly incredible. Yes, you need to spend some time training them to your speech patterns and accent, and yes it makes a big difference if you use a quality microphone (not the one that's built into your laptop or iphone) at a fixed distance. With a decent setup transcription software can be really impressive at high speeds and with complicated vocabulary - talk to a doctor in a large modern hospital - many are trusting these systems with their patients medical record information, and these guys have high expectations when it comes to transcriptions because they are used to having very skilled ears listen to them mumbling jargon quickly for their transcriptions.

      Having anything but a really good setup can be really frustrating though - maybe slashdot tinkerers have dabbeled and written these kinds of apps off. I do imagine that it wouldn't be worthwhile using anything but the top dictation apps if you want to avoid any serious post-editing.

      --
      ôó
    28. Re:Dear aunt, by Anonymous Coward · · Score: 0

      making informed guesses based on context, which a computer program cannot do.

      You mean like the branch prediction common to all modern pipelined CPUs that makes their high performance possible?

    29. Re:Dear aunt, by Anonymous Coward · · Score: 0

      let's set so double the killer delete select all.

      Seriously, transcribe it manually... automatic speech recognition just doesn't work. And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.

      that is precisely what most voice recognition apps do. You may have heard of dragon naturally speaking.. one of the more popular voice recognition apps. The problem with voice recognition and why it can be so tricky is the phonetics. a Voice recognition program typically captures auto and compares it to it's phonetic library to rely on getting the sounds right, it then makes an educated guess at the word that is later adjusted based on the context of the sentence. The final output is often much closer than the original output based on phonetics alone. 'training' the application is crucial because phonetic expressions of different vowel and consonants as well as their combinations vary with accent and dialect. The act of running a 'training' program for voice recognition allows the app to record and better adjust to your own dialect and speech patterns by recording the way you pronounce predefined words and adjusting it's phonetic samples accordingly.

      the science is about as imperfect as speech synthesis, emotional undertones and tone of expression are just as difficult to reproduce as they are to properly synthesize.

      Now you know, and knowing is half the battle-- G.I. JOE!!!

    30. Re:Dear aunt, by orangesquid · · Score: 1

      "automatic speech recognition just doesn't work."

      Shirt it does! Autumn attics peach wreck ignition maybe far from I deal but I trusted enough verdict hating email. I don't heavy enough time to free view the output any. How

      --
      --TheOrangeSquid Is it any wonder things seem so awry? We swim in a sea of confusion and don't have to think to survive
    31. Re:Dear aunt, by Mr.+Pibb · · Score: 2, Interesting

      I call bullshit on your bullshit.

      I do occasional work for a Worker's Comp doc who has been working with Dragon for over 10 years. He swears by it.
      The work is an hour-long interview, and hours of paperwork. He dictates the report into a MiniDisc recorder while reviewing his notes and then plays the recording back into the computer, watching for errors (few) and reviewing. I've also set up several other docs in the same field with Dragon, and they're quite pleased with it as well.

      At first, he had to buy the latest HW and audio cards to get the best accuracy, but now runs Dragon virtualized on a 1st-gen MacBook without a problem. Dragon FTW!

    32. Re:Dear aunt, by Verteiron · · Score: 1

      I believe that Android phones send the recordings off to Google, where they are analyzed and the text sent back to the phone.

      --
      End of lesson. You may press the button.
    33. Re:Dear aunt, by morgan_greywolf · · Score: 1

      Could be. If that's the case, I'll bet there's a Google App Engine API for that.

    34. Re:Dear aunt, by mwvdlee · · Score: 1

      I think a PC application is cheaper than an iPhone + subscription + app.

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    35. Re:Dear aunt, by hairyfeet · · Score: 1

      Have you tried Windows 7 Speech recognition yet? I haven't got to play with it myself but I have some friends and family that have and they love it. And since it comes with the OS you already have it if you have Windows 7.

      I'd love to hear from you or anybody else that has used it, because if it works decently well I may start keeping USB headsets for the older folks. It would sure be easier for the hunt and peck typers if they could just talk their emails and docs.

      --
      ACs don't waste your time replying, your posts are never seen by me.
    36. Re:Dear aunt, by micheas · · Score: 2, Interesting

      I can see medical transcriptions being the best point of transcription software.

      The vocabulary is largely devoid of slang.

      You have long specialized lexicons that are similar to very few other words.

      The vocabulary is probably fairly small as most doctors have a fairly specialized practice, so internists don't deal with the same areas as podiatrists, reducing the words that are used.

      The repetition is probably fairly high, allowing for training to be more effective than speech on random topics.

      In conclusion, for what the original poster wants, voice recognition software is probably not viable, but if you have a medical practice, and are not a general practitioner, you may well find that voice recognition software is usable.

    37. Re:Dear aunt, by msclrhd · · Score: 2, Interesting

      Your post highlights a key difference between written and spoken words -- we tend to contract words ("have a" to "hav.uh") and will flow one word into another ("said John" the d at the end of said and the d in the dZ sound merge, so the d at the end of said is dropped -- "sE dZ0n").

      Some people drop certain letters at the beginning and end of words -- "'e said 'what 'ave you been doin' today?'". This also makes it more complicated to transcribe. Not to mention regional dialect variations and strong accents.

      Then you have words like "four candles" "fork 'andles", "night train" "night rain" (http://en.wikipedia.org/wiki/Homophones) -- a lot of The Two Ronnies humour stemmed from word play that take advantage of the difference between written and spoken speech and how the audience interprets them (see the Hieroglyph sketch for another classic example). 'Ello 'Ello did a similar thing as well.

    38. Re:Dear aunt, by Anonymous Coward · · Score: 0

      Winshill.

    39. Re:Dear aunt, by Anonymous Coward · · Score: 0

      English language hasthe roperty of becoming easier when being used in technical matters.

      The vocabulary used by a medic, or an IT person when speacking strictly of technical matters narrows down a lot. It's much easier to set up a dictionary weighted on those words.

      It's also much easier to guess based on context, since the egeneral context is nown and if you're talking about some, for example, earth problem(for medic) ot network problm(for IT) theere is a narrowed don possible outcome of the phrase.

      Many languages do not get easier when they get technical (most languages deriving from latin have this characteristic), and the vcabulary tend to be bigger when technical(I'm sure this hapens in italian, due to the technical words being often in english, french or latin(this last one very common in law) and definitely not used in common speach).

      Try reading Tolkien or Melville to that software recognition and let's see what happens.

    40. Re:Dear aunt, by k.a.f. · · Score: 1

      A program capable of "making informed guesses based on context" seems perfectly plausible, though that's not part of speech recognition per se.

      If you believe that, you don't know much about speech recognition.

      Seriously, the language model in modern dictation systems is THE most important part. The computer gets much more relevant information from a-priori probabilities of words and sounds than from recognizing them directly, because most of the time the sounds that our brain thinks it hears are objectively not there at all. Read up on NLP some time; it is a totally fascinating (if somewhat depressing) field of research.

    41. Re:Dear aunt, by slim · · Score: 1

      Could it be that people with certain accents have success with Dragon, while others do not?

      I've found with some products (and people!) -- low end products like Nintendo Brain Training and Google that my instinct is to try and speak more clearly. That, to me, is to go closer to British RP.

      What actually works is to put on a mock American accent.

      See also, ordering a Bud in Texas. You have to ask for a "bird" and then they understand ;)

    42. Re:Dear aunt, by SwedishPenguin · · Score: 1

      Seriously, transcribe it manually... automatic speech recognition just doesn't work. And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.

      Anonymous coward, meet statistics...

    43. Re:Dear aunt, by amentajo · · Score: 1

      the only reason humans can [insert thing here] is by [insert other thing here], which a computer program cannot do.

      Plenty of other people have already responded to this already, but I feel that it bears mentioning that statements like this are pretty unfair to the efforts of researchers who are constantly trying to better understand the way that the brain works and model it appropriately.

      Unless you think that humans are infused with supernatural material, a computer program can absolutely "do" anything that a human brain can, given enough time and effort. At the very least, emulating the chemical processes that occur in the brain can allow a computer program to hear human speech and derive "meaning" from it, in some context that can be converted to textual data. Don't overestimate physiology, and don't underestimate the potential of computation.

      Beyond that, artificial intelligence is a possibility. Who needs a full brain emulator when you can just reverse-engineer the parts that do thinking?

      When you say "a computer program cannot [x]", some people take that as a challenge to make a computer program or find an algorithm that does "x".

    44. Re:Dear aunt, by juasko · · Score: 0

      Dont be so sure about that... resently on slashdot it was linked to IBM's new project with a computer doing jeopardy.

      here is the link again: http://www.research.ibm.com/deepqa/

    45. Re:Dear aunt, by Anonymous Coward · · Score: 0

      Actually the system works quite well in the radiology setting. I am a radiology registrar and we have been using the integrated reporting system (GE) for some time.

      You dictate the study, review it, make any corrections, validate and the report is immediately available for review. Very helpful for the emergency doctors to get quick access to reports for trauma / after hours CT.

      The system of using typists, then waiting for the report, reviewing it and validating is long outdated (but still used in some private practices). It slows us down having to review our reports maybe a day or so later. Plus the requesting doctors often have to come to us for a verbal report. Quite time consuming for everyone and not good for patient care.

      Anyway, I am all for digital dictation. Sure it isn't perfect, but there are many side benefits in my line of work that make performing my own corrections at the time worth it.

    46. Re:Dear aunt, by bami · · Score: 1

      Dear aunt, let's set so double the killer delete select all.

      I'm afraid the voice recognition in 7 is still rubbish, you need to train it first with some stupid wizard, and then it only sorta works on commands.
      Transcribing is still horrible and I only get some reasonable results if I speak to it in a full-on british accent. I'm not even FROM the UK!

    47. Re:Dear aunt, by Anonymous Coward · · Score: 0

      A lot of transcription companies actually send medical recordings through speech recognition and then correct it, sending it back to the doctors as "transcribed".

      For specific use (i.e. medical field), speech recognition works extremely well, as painandgreed says.

      For home use, you will have to do quite a lot of work to get Sphinx to work for you. If you can collect written documents from the people your recording (diaries, school essays, etc.), you might improve your chances.

      Oh, and avoid MP3. The compression is great for music; too much information is lost for speech recognition. Use CELP if you can, or PCM (WAV) is best.

    48. Re:Dear aunt, by Anonymous Coward · · Score: 0

      I work for Nuance Communications and we have revenues of nearly $400,000,000.00 speech recognition technology. So trust me, it works great. Odds are good that your doctors is dictating into something that uses either Dragon for front end dictation (where they edit the output) or a Nuance / Dictaphone dictation system that uses backend dictation (where a transcriptionists edits the speech recognized results).

      Anyway, Dragon is not open source, but it can be configured to transcribe WAV files fed to it.

    49. Re:Dear aunt, by Attila+Dimedici · · Score: 1

      Actually, my experience is that for a single individual, over time, Dragon Naturally does become very good. The problem is that that is a result of it learning what that specific person is saying. It is not terribly good if you keep switching people.

      --
      The truth is that all men having power ought to be mistrusted. James Madison
    50. Re:Dear aunt, by Anonymous Coward · · Score: 0

      Bullshit. I use it all the time to dictate word documents. Its not perfect, I usually see one or two errors per average 5-6 sentence paragraph, but nothing that can not be fixed with a proofread, knowing the context. It is far better, out of the box than Dragon, which I gave u using when I moved up to windows 7.

    51. Re:Dear aunt, by Anonymous Coward · · Score: 1, Funny

      wow. two opposing responses modded to +5. this is tough.

      (chews nails)

      who is right?

      (nervously taps fingers on desk)

      And the winner is Slashdot ID 692029!!!!

      It was very close, but 692029 had the other guy beat by 556.

      Phew. I'm glad we have a pecking order around here to break the draw.

    52. Re:Dear aunt, by CronoCloud · · Score: 1

      Could it be that people with certain accents have success with Dragon, while others do not?

      Could be, and perhaps tone/pitch as well. My late mother, who had very bad rheumatoid arthritis, couldn't effectively type, and for long letters or e-mails it was hard for her. So I bought her a copy of Dragon, think it was 8 preferred version, it was a few years ago. Anyway, it simply didn't work for her. She couldn't even get past the first training sentences. Me, on the other hand, it worked right off.

      For my mother, Word's built in speech recognition worked better, but she found it annoying so preferred to peck slowly (and take breaks) or dictate to someone.

       

    53. Re:Dear aunt, by Anonymous Coward · · Score: 0

      It is a specialized grammer restricted context specific engine they use for medical dictation, which is why is costs $$$ and works so well. You run into trouble when you try to unrestricted the context of speech recognition software. This greatly reduces the accuracy and increased time for training is needed to compensate for any usable purpose.

    54. Re:Dear aunt, by mrchaotica · · Score: 1

      The worst thing about Google Voice transcription is that there's no interface for correcting it.

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

    55. Re:Dear aunt, by Anonymous Coward · · Score: 0

      Its all about training the user. Speech recognition isn't good enough to adapt to the way we speak in normal conversation. Learning to get accurate results from speech recognition is a skill that requires practice. For some the payoff is great. Others don't have the patience to stay with it. And as you point out, accents can still be a problem.

    56. Re:Dear aunt, by Simetrical · · Score: 1

      Your post highlights a key difference between written and spoken words -- we tend to contract words ("have a" to "hav.uh")

      That's not a contraction, is it? The words are just pronounced one after the other, with have stressed and a unstressed. A contraction is when some sounds are actually omitted, like it's for it is.

      --
      MediaWiki developer, Total War Center sysadmin
    57. Re:Dear aunt, by misosoup7 · · Score: 1

      let's set so double the killer delete select all.

      Seriously, transcribe it manually... automatic speech recognition just doesn't work. And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.

      So true. It doesn't work very well, you'll end up spending the time to fix it. However, if you type extremely slow, then it might be a good idea to "transcribe" the text and then edit. Now Google voice tries to transcribe the messages you leave on it. Needless to say, it only gets your partly there. You'll have to edit the result. Trust me, it can be pretty disturbing what it comes up with. However, if you can get some sort of software to play your mp3/wav's to Google voice you might get that to translate it for you.

    58. Re:Dear aunt, by Captain+Damnit · · Score: 2, Informative

      13 years ago, when I entered the medical transcription industry, the fellow who sold us our dictation system told me that he was a dead man walking: voice recognition was going to KILL the transcription industry, and he almost felt guilty selling us the system. When we mentioned we had looked at Dragon, he practically cried. 13 years later, that salesman is now deceased, and the transcription industry is larger than ever. Voice recognition in transcription is like Linux on the desktop: every year, articles pop up saying that THIS year will be the year medical transcription dies at the merciless hands of voice recognition.

      For a guy in an industry that Netcraft has confirmed is deader than FreeBSD, I'm doing pretty well.

      I now own a medical services company that does transcription, so my opinion is certainly biased here, but I fail to see the economic logic in turning a physician, who makes between $120-250K per year, into a clerical worker editing his own files. Especially when said clerical worker can be seated in India. Time is money, and the time of physicians and surgeons is one of the most expensive line items on your medical bill. Even with transcription prices as they are today, tacking 20 minutes of extra editing time onto a doctor's already long work day means that I can do it cheaper with manual labor. Voice recognition just means that I need one MT and a voice recognizer instead of one transcriptionist and a QA person.

      Internally, we use a batch speech recognizer based on Sphinx, as the Dragon source is too expensive to license in the volumes that we do. As one of the earlier posters said, the code is the easy part...it's generating the speech corpus that's the really expensive part. Developing that was easily a seven-figure outlay in labor, which is why you don't see any usable free medical speech corpi available for free* on the Internet. You'd think with all the federal money being thrown at making medical records electronic that they could spare a few million to develop an open-source speech corpus, but that would make too much sense.

      As long as physicians and surgeons are better paid than the rest of us, someone will be doing transcription.

      --

      * If you know of one, post a link...believe me, we've looked.

    59. Re:Dear aunt, by Eggplant62 · · Score: 1

      I transcribe for a living and have used all the products referenced in this discussion to some capacity, whether it be through work or through my own independent exploration and study. I find that Naturally Speaking is difficult to train. I find that the Scribe transcription player is an excellent tool on both Linux and Windows to control audio playback with a footpedal. I can transcribe effectively well on pretty much any word processor.
      I'm stuck in proprietary hell, though, when it comes to what I do for my employer. I work on a Windows-based proprietary platform based on a very well developed speech recognition engine. What's holding back development in this area in the free software world is patents. Most of the big outfits doing work in the voice recognition and transcription areas are patenting everything they can a'la our friends at Microsoft & IBM, if the methods haven't already been patented by someone else. I'd love to see an open source solution that could do what the software at work does, but that's going to be a long time coming.

    60. Re:Dear aunt, by synthespian · · Score: 1

      Should hire shorthand specialists. They're faster than transcibers. They'll capture audio and put into shorthand and then transcribe it. I know this sounds like it would take longer, because ou insert an intermediary step, but the fact is that transcribers are fast-typing from audio, and that is a stop-and-go process. Shorthand specialists will just fast-type from shorthand notes (and, lo and behold, international competitions - yes, they have that - shows that exccellent accuracy), so they don't stop all the time, because shorthand ttranscription is a start-to-finish process. Also, when people hire shorthand specialists ffor events they have faster results. The transcriber must first wait for the whole event to end, i.e., have audio file, whilst the shorthand specialists begins caputing speech the moment the event begins.
      I know this because I'm sort of a shorthand freak myself, so I'm speaking from experience (and that of my teachers).

      --
      Main difference between the BSD license and the GPL license: one is from California and the other is from Massachusetts
    61. Re:Dear aunt, by synthespian · · Score: 1

      Not to call bullshit on your calling bullshit but...as I mentioned in another post, I'm a fan of shorthand writing (tremendous use for me in conferences). My shorthand teacher is a shorthand court room specialist. He regularly attends session with debate. They don't use speech recognition software.

      First, because of microphone issues (not always a person will remember to speak directly in the microphone). Second, in sessions with over 12 magistrates or judges debating (NB: this is not the US), there are licensing issues (every head a license, rght?). Third, the training part... the defendant's lawyer, for example...must he train the software too? It's not feasible to demand that of lawyers (to begin with, that costs them their time, and you know how expensive they charge). Also, will the software handle the long Latin citations (the whole thick book of Roman law citations)? It better...All three ways of pronouncing Latin (reconstructed, historical, Vatican), or else it's a fake. Finally, in criminal courts there's - besides the issue of training the software to understand the uncooperative criminal - there's an issue regarding how criminals speak: often from a lower social class, they'll not speak like the judge or the lawyers; also, they'll use a very different vocabulary (street lingo, gang slang, whatever you want to call it). These, of course, will always change, and will also be diffent from city to city, etc. I don't think Dragon has the manpower to keep with the streets and continously update their Markov chain/database with the new slang. Does anyone?

      The place where speech recongition fits well is when the shorthand specialist transcribes the notes.

      Otherwise, it's a stupid proposition that your PC can replace a functioning human cortex. Not yet. Speech recognition only works in fairly limited domains (such as medical terminology). This is like automatic translation by Google, etc. It's this view that only things processed by computers are "tech". Writing is "tech". Auxiliary language design (i.e., "eo", or Esperanto) is also "tech". But in come the computer nerds and think they can solve every fucking problem with their programming shite, without ever taking the time to learn what people came up with eons ago (and then build on it, e.g., "eo" as an intermediary step, mapping language to language). But stick around another 50 years. NP-hard problems be damned. Humans are just stupid, anyway. What good is an expert? Sex, physical exercises, armies, and doctors will be replaceable in the new distopia, so will your brain. Let's give you a problem, then we'll sell you the solution and make the economy go 'round and 'round.

      --
      Main difference between the BSD license and the GPL license: one is from California and the other is from Massachusetts
    62. Re:Dear aunt, by synthespian · · Score: 1

      You, sir, are right on the money.

      And these are deep problems that only the gullible would ignore - perhaps buying many licenses from one of these speech recognition firms.

      We live in times when journalists on TV will have the final word on what's feasible or not.

      --
      Main difference between the BSD license and the GPL license: one is from California and the other is from Massachusetts
    63. Re:Dear aunt, by Felgerkarb · · Score: 1
      I think people bashing voice commercial voice recognition software either don't use it or are trolling.

      As someone who uses dictation software at work every day, all day, and has used several different packages (all I believe use Dragon), I can tell you it works. Perfectly? No. Very well? Yes. Well enough. Definitely.

      If I were so inclined, I would have no problem using it at home personally, but I don't really have a pressing need. A good friend who has some repetitive stress disorder issues does use it at home, and, again, it works well enough.

      As an aside, I would add given the years of time Dragon has been developing and perfecting technology, I would be pleasantly surprised, but very surprised, if there was an effective non-commercial solution out there.

    64. Re:Dear aunt, by Corwn+of+Amber · · Score: 1

      Well they'd better host the projects in China where no one cares for the rest of the world's IP laws.

      What's wrong with you people? Technical problem : we can't code stuff because it's all patented to hell, boo hoo. Technical solutions: host the code wherever those laws don't apply. Let people who find it in the rest of he world use it without advertising their doing so...

      Your code will end up - stolen - in Chinese devices, of course. Those devices will get sold in the States, because you Americans buy 80% of all electronic gizmos in the world. Then there will be some scandal here, for GPL violation or such, then IBM will try to sue the manufacturers of the devices, which won't work of course, then it will get interesting when they try to injunct Wal-Mart to stop selling the stuff.

      --
      Making laws based on opinions that stem up from false informations leads to witch hunts.
    65. Re:Dear aunt, by Corwn+of+Amber · · Score: 1

      What, the most expensive line item on the bill?

      No Way. The really expensive thing in Medical is HARDWARE. How many doctor years does one CAT machine cost? An fMRI device?

      --
      Making laws based on opinions that stem up from false informations leads to witch hunts.
    66. Re:Dear aunt, by BitZtream · · Score: 1

      You miss my point. My point is not that Dragon sucks. Its the best thing on the market, hands down. It works way better than just about any alternative, including some that use its own engine as well!

      My point is that medical transcription is generally not using your everyday dictionary. Its a bunch reports with people who learned English as a second or third language using medical terms that are long and complex and generally pronounced and spelled in their latin names, not english. Most people who do transcription for doctors have to get used to hearing a doctor to be really proficient at transciption, but they can replay it enough times to get it right. They also ignore the doctors who do their reports sitting at home, with the TV blasting and the little kids running around screaming their lungs out, all of which you hear on the report.

      Go ahead, try talking to Dragon about Doxycycline or dextromethorphan, see how well that works out for you. THAT is what these transcriptions are made of, long medical terms describing the proceedures.

      Also keep in mind, the transcriptions listen too and type this stuff in using all sorts of shortcut macros and at 2x the speed or more because they have nice domain specific knowledge. They also know enough to catch obvious mistakes in the reports and report them to the doctor that their MAY be an error to have the report reviewed.

      You aren't transcribing an every day email in an ideal setting, its a medical report in what is in most cases anything but an ideal situation.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    67. Re:Dear aunt, by Felgerkarb · · Score: 1

      Go ahead, try talking to Dragon about Doxycycline or dextromethorphan, see how well that works out for you. THAT is what these transcriptions are made of, long medical terms describing the proceedures.

      I guess I should have been clearer... I use a Dragon based dictation software every day in a medical setting, to dictate medical reports, letters, notes, consultations, etc.

  3. Sphinx by SaXisT4LiF · · Score: 5, Informative

    Carnegie Mellon has an open source speech recognition project you might want to look into. Sphinx

    --
    Fight or flight its all the same
    Live to die another day

    --Ryan
    1. Re:Sphinx by Anonymous Coward · · Score: 0

      Yep. Sphinx, while FAR from perfect, is the only open source option out there. Also, you can create your own voice dictionaries for it.

  4. Unfortunately... by dmneoblade · · Score: 3, Interesting

    I spent several month searching for something like this. Open-source voice recognition is in really infant stages, and there does not seem to be much interested in improving the few things we have.

    --
    Warning, knife is sharp. Please keep out of children.
    1. Re:Unfortunately... by Anonymous Coward · · Score: 0

      The second Canonical decides it needs to be done we'll go from 0 to 60 in 12 months.

    2. Re:Unfortunately... by Bigjeff5 · · Score: 1

      Open-source voice recognition is in really infant stages

      It's a very old infant, too. :/

      Your best bet for text to speech is to use Google's TTS services - they are impressively accurate (though still nowhere near perfect).

      If it's going to be cutting out a lot of time, it may be worth buying a commercial product.

      --
      Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
    3. Re:Unfortunately... by dargaud · · Score: 1

      I spent several month searching for something like this. Open-source voice recognition is in really infant stages, and there does not seem to be much interested in improving the few things we have.

      Funny, I had the exact same thought 25 years ago when playing with speech recognition software on the Apple ][... I don't know if things have improved much since, as I use 3 languages daily, there's just no software that can handle my accents and language changes.

      --
      Non-Linux Penguins ?
    4. Re:Unfortunately... by Corwn+of+Amber · · Score: 1

      Go try to use an Ubuntu out of the box.

      Now go do the exact same thing on MacOS.

      One is a desktop Unix done right, the other is a thin layer of gloss over a ton of incompatible software.

      When I see a distro as polished and finalized and unified as MacOSX, I'll consider Linux on the desktop.

      --
      Making laws based on opinions that stem up from false informations leads to witch hunts.
  5. I've wondered about this too by itamblyn · · Score: 2, Interesting

    It seems like there should be some way to "hack" the audio transcription that google offers through google voice or youtube. Unfortunately I haven't found a way to upload a file. With youtube, if you make a fake movie, it gives an error that it can't be transcribed. Getting google voice to work would require some sort of phone interface I suppose...

    1. Re:I've wondered about this too by Enuratique · · Score: 4, Informative

      Google relies on Twilio for their audio transcription.

      --
      A black hole is where God divided by 0
    2. Re:I've wondered about this too by itamblyn · · Score: 1

      Source?

    3. Re:I've wondered about this too by Anonymous Coward · · Score: 0
  6. Best Idea by Anonymous Coward · · Score: 3, Informative

    just upload it to youtube, its genius google transcription technology will make everything sense out of it.

    1. Re:Best Idea by Culture20 · · Score: 0

      genius google transcription technology

      That's not a transcription technology, that's the comments. People tend to repeat the phrases they like. For example:
      genius google transcription technology. ROFLMAO
      Why'd you steal this video? The original has over 2,000,000 views. You stole this.
      I like when he said "upload it to youtube" ;) 3 (it looks like a heart!)

    2. Re:Best Idea by Anonymous Coward · · Score: 0

      Score:3, Informative. Wow. Some /. mods have everything sense out of them.

  7. Having delt with this... by Skuld-Chan · · Score: 1

    I'm interested in open version of a transcription app (I run a lab with a lot of this software/equipment) but this is a very vertical market - up until recently there wasn't any standard interface for the foot pedal (newer ones are hid usb devices now).

    I had to throw away a bunch of sony serial devices because they only worked with one app I can't make work on newer versions of Windows.

  8. youtube by Anonymous Coward · · Score: 1, Informative

    upload to youtube and let it create closed captions. the results won't be perfect, but it will be better than most software.

  9. simon and julius by Anonymous Coward · · Score: 0

    I've tried simon and julius, but couldn't get past the learning curve to do actual transcription. I will say that it looks like both could be better for recognizing "just your own voice" once you get past the learning curve enough to train. The commercial software is good at recognizing everybody's voice, which isn't that helpful for transcription.

  10. XTrans by ceraphis · · Score: 2, Interesting

    Why don't you give XTrans a shot: XTrans

    1. Re:XTrans by __aasqbs9791 · · Score: 1

      Being interested in this for meetings after checking out the user manual for xtrans it sounded really interesting, but I can't run it on my system and I can't find any useful help on their pages about this error on my Ubuntu system:
      /dev/dsp: Device or resource busy
      terminate called after throwing an instance of 'QWave2::AudioDeviceError'
      Aborted

      I'm not even sure where to look next since a quick google search didn't turn up a useful fix.

    2. Re:XTrans by ceraphis · · Score: 1

      The downloads page says it requires QWave for waveform display and playback. Could that be the problem?

    3. Re:XTrans by Kev+Vance · · Score: 3, Informative

      Ubuntu uses PulseAudio on the ALSA audio subsystem, but that error message indicates XTrans is trying to use the OSS audio subsystem instead. To work around this, try using the Pulse OSS wrapper or temporarily disable Pulse. From the commandline, "padsp xtrans" or "pasuspender xtrans".

      --
      F0 07 C7 C8
    4. Re:XTrans by Anonymous Coward · · Score: 0

      Try looking here: http://wiki.debian.org/SoundFAQ/

    5. Re:XTrans by Anonymous Coward · · Score: 0

      This. This is Lunix

    6. Re:XTrans by __aasqbs9791 · · Score: 1

      Thanks, "padsp xtrans" seems to be working.

  11. Not open source, but hackable = SAPI in Windows by Enuratique · · Score: 1

    Have you looked into the Speech API's baked into Vista and Windows 7? If you're familiar with .NET coding, version 4 of the framework provides easy to use hooks into the speech api. The only problem is it is designed to be used with fairly specific grammars/lexicons (programmer supplied) however it does come with a general speech recognizer - but you'll get some interesting results without training it first. http://msdn.microsoft.com/en-us/magazine/cc163663.aspx Downsides also include it only natively supports WAV files but that can be addressed with some rolling-your-own goodness.

    --
    A black hole is where God divided by 0
    1. Re:Not open source, but hackable = SAPI in Windows by Anonymous Coward · · Score: 0

      It is trivial to take a mp3 file and to prepend a wav header without modifying the data. This should solve your issue.

  12. Mechanical Turk by joeharrison · · Score: 0

    You should look into using Amazon Mechanical Turk.

    See this: Cheap, Easy Audio Transcription with Mechanical Turk

  13. No by waldoj · · Score: 1

    I've put a bunch of time into this for a project of my own. The short answer is, no, I have found no such program. I've experimented with a few older programs, but they're useless. Sorry.

  14. Google voice does transcriptions by Anonymous Coward · · Score: 0

    You could record it, then call yourself and play it through the phone.

    1. Re:Google voice does transcriptions by ddillman · · Score: 1

      I'm sure the resulting high-quality audio signal will help Google Voice do an even better job than usual...

      --
      Little girls, like butterflies, need no excuse. -- L. Long
  15. But Windows Speech Recognition... by Monkeedude1212 · · Score: 4, Informative

    Most Windows Vista or Win7 machines come with a built in transcribing feature, that you can enable in the control panel (Win7, under ease of access, Speech recognition).

    However - the only way it works properly is if you train it to understand you personally. You load your profile, and it'll run you through a whole bunch of test sentences. The FULL test takes you about 20 minutes I think (It's been a while since I've used it) - and actually works quite well. There is a cut off point at about 2 and a half minutes if you want to stop and try it out. It actually makes it keyboard and mouseless if you want. When you open a browser it highlights everything on the web page thats clickable and assigns it a number, and you simply say "Click 7" and it hits the reply button for you. Then you talk when the textbox has focus and it'll transcribe every word you say.

    I did this for my girlfriend's paper once, I read it aloud (you have to mention things like comma, end paragraph, etc) and put it into a Word document. Out of a 15 page single spaced Essay - it got 3 sentences wrong - and that's only because I was mentioning some of the more Obscure greek names (she's a history major). It managed to get full sentences regarding Octavia and her fondness of libraries without error, which I thought was odd since thats not a name you hear every day.

    Anyways - if he wants to do this, he should record the test phrases (there will be a lot though) and have each of his interviewees read the test sentences so he can then relay those through the computer and train the computer for each person.

    All in all - he may still run across a few errors, but its not nearly as bad as say Google Voice Mail, which tries to figure out what you're saying without having any previous knowledge on how that person speaks. Windows Speech Recognition is something that will handle what he's after though.

    1. Re:But Windows Speech Recognition... by Monkeedude1212 · · Score: 1

      Forgot to mention: I'm not entirely sure he needed an "Open Source" Solution as much as he needed a "cost effective" solution though - he makes no mention of altering any code. So I mean, Windows Speech Recognition is not exactly Open Source.

    2. Re:But Windows Speech Recognition... by markdavis · · Score: 1

      >So I mean, Windows Speech Recognition is not exactly Open Source.

      It's not exactly multi-platform either. He might be using Linux, for example (like so many of us do). Really, the original post left off a lot of potentially useful (narrowing) info.

    3. Re:But Windows Speech Recognition... by unix1 · · Score: 1

      All in all - he may still run across a few errors, but its not nearly as bad as say Google Voice Mail, which tries to figure out what you're saying without having any previous knowledge on how that person speaks.

      Each Google voicemail transcription has an option for a user to mark whether the transcription was accurate or not. I wouldn't be surprised if they were tying that into the caller-specific profile. So, if you leave a message for friend A and he marks the transcription good, that data may be used when you call not just person A, but everyone else in the future. In fact, it wouldn't make any sense otherwise. Now, how much actual "learning" algorithms they have on the back side, I cannot tell you.

    4. Re:But Windows Speech Recognition... by LordLucless · · Score: 1

      It probably helps that Greek is transliterated when rendered with the English alphabet - which means most of the funky names you were saying were spelled phonetically, and thus easy for a recognition engine to pick up - even easier than a lot of regular English words.

      --
      Just because you're paranoid doesn't mean there isn't an invisible demon about to eat your face
    5. Re:But Windows Speech Recognition... by Anonymous Coward · · Score: 0

      I tried to use the Windows Speech Recognition function on Windows 7, it unfortunately only seems to accept microphone input. It would need to take audio files as input to be useful for the OP.

      Is there any way to feed it an audio file? Even if it's just a hack like some virtual microphone interface that can take an mp3 file then output it to the OS?

    6. Re:But Windows Speech Recognition... by siriuskase · · Score: 1

      That's my experience. I don't want to talk to my computer, I want to talk to a voice recorder, then transfer the files to the computer for transcription. For someone who types faster than she talks, the nuance/dragon type stuff is useless, but I'd love to create voice files while driving or have my answering machine files transcribed, cause I can read faster than i can listen. Who cares if a few of the words are misspelled, I could still listen to the sound files and clean up the text. kinda like OCR.

      --
      If you must moderate, please moderate as irrelevent, not something bad, because I'm sure someone will find this interest
    7. Re:But Windows Speech Recognition... by Anonymous Coward · · Score: 0

      If you're willing to do a very simple bit of .NET development, you can use System.Speech from the Windows SDK. Basically, SetInputToWaveFile() is the API you want to use. Note you'll probably need to convert to .wav (ffmpeg works) to do this.

    8. Re:But Windows Speech Recognition... by Anonymous Coward · · Score: 0

      Neither is open source, necessarily. If he wanted a Linux product in particular he probably would have mentioned it, rather than specifying open source and having to port it to his platform of choice.

  16. USB foot control by wguy00 · · Score: 2, Informative

    Buy a USB foot control (check out infinity or fortherecord), and download the free player from fortherecord.com. You can stop, start, rewind and fast-forward without having to take your eyes off the screen or leave your word processing app.

    1. Re:USB foot control by IANAAC · · Score: 1
      I can't tell from their web page, but does their software allow you to speed up/slow down the recording without any distortion of speech?

      That's one reason I like Express Scribe.

  17. Open source no. by jnnnnn · · Score: 1

    Here's a list. In my experience, only Dragon is worth trying, with the following caveats:

    • It helps to spend ten minutes training it for each voice
    • It will still only get 99% accuracy
    • You need a high quality (low noise) recording with a good microphone

    On the plus side, correction is easy -- read the document, and select words that look wrong to hear what they sounded like.

    Most of the other programs are aimed at very small vocabularies (i.e. 100 words) for accessibility applications (controlling a computer).

    1. Re:Open source no. by markdavis · · Score: 1

      Dragon is not open source. It is not even multi-platform.

    2. Re:Open source no. by ducomputergeek · · Score: 1

      But the link does show opensource solutions on the list. The OP is just stating that in his experience, the only solution he has found that works is Dragon and relating his experience with Dragon.

      --
      "The problem with socialism is eventually you run out of other people's money" - Thatcher.
    3. Re:Open source no. by maxume · · Score: 1

      Yeah, the subject of their post was 'Open source no', so they may have been up to speed there.

      --
      Nerd rage is the funniest rage.
  18. I looked, but still do it manually by ciaran_o_riordan · · Score: 4, Informative

    I've worked on loooads of transcripts. I did most of these:

    * http://wiki.fsfe.org/Transcripts

    The best technique I've found is to have mplayer play the audio at 60% normal speed and have a text editor (emacs is my preference) in another window, flick between them with alt-TAB and hit Space to start and pause mplayer.

    1. Re:I looked, but still do it manually by mutube · · Score: 2, Informative

      I'd agree. I did some part-time work transcribing audio a while back for extra pennies. One thing I would add is that instead of using Alt-Tab to switch applications and then hitting space to start/stop I found it was less frustrating to set up global keys for the purpose (I was using KDE at the time, I expect most desktops offer this).

      I assigned F12 to skip back 5 seconds and F9 to pause/restart. Using those (esp F12) it was relatively easy to keep up to speed with what was being said without switching away from the editor.

    2. Re:I looked, but still do it manually by nbauman · · Score: 2, Informative

      I've done loads of transcripts too.

      The best software I found was the Olympus DSS Player 2002, which came bundled with the expensive Olympus digital recorder (but the cheap ones had a bare-bones software). It was like the old mechanical tape transcribing machines, except much better, with adjustable back pedal, 50% slow speed, 200% fast speed, fast forward, fast back, etc. Newest version is probably better.

      Problem was it was optimized for the Olympus proprietary *.DSS format, although you could use *.WAV with some limitations on features.

      Sony and the other digital recorders also had playback software; I haven't checked them out but they're probably equivalent.

      NCH Scribe (free) could have been a clone of the Olympus player; NCH didn't work as well when I tried it, although later versions may work better.

      These programs can be overkill, mplayer at 60% speed sounds like it would work well.

    3. Re:I looked, but still do it manually by Anonymous Coward · · Score: 0

      I'm a colemak typist. I hook a USB keyboard up to my laptop and put the multimedia play/pause button under my big toe and type as fast as I can all the way through it.

      I can't stand doing things stupidly, and if Dragon Naturally Speaking was in the Ubuntu Software Center, I'd probably buy it.

    4. Re:I looked, but still do it manually by Corwn+of+Amber · · Score: 1

      Oh go install qemu and a virtual windows to use warez like we all do.

      What, you need links?

      --
      Making laws based on opinions that stem up from false informations leads to witch hunts.
  19. Hmm by Anonymous Coward · · Score: 0

    "I am beginning to do some interviews with family members and will do some audio journals for genealogy purposes. I would really love to be able to run the resulting MP3 or WAV files through some software a get a text file out. I know that software like this exists commercially. But does this exist in the open source world?"

    I think you already have the software, and are testing it on this ask Slashdot question. Well played.

  20. If you don't find anything... by afabbro · · Score: 2, Interesting

    ...you could always use RentACoder (er, Vworker.com now) and hire someone for pennies to do it.

    --
    Advice: on VPS providers
    1. Re:If you don't find anything... by Anonymous Coward · · Score: 0

      Or Mechanical Turk...

      Mostly the ASR won't work because it's in an unlimited domain. It can work very well when it only needs to distinguish a fraction of the 8,000 or so common(ish) words in the english language.

    2. Re:If you don't find anything... by Anonymous Coward · · Score: 0

      Seconded. This is one of those tasks where it's really easier to pay a human being to do the work, sites like rentacoder, elance, odesk, guru.com and others would be a good place to look.

  21. My God Man, Just Buy The Damn Shif!! by Anonymous Coward · · Score: 0

    What are you a fucking hobo?

  22. HTK by Anonymous Coward · · Score: 0

    You can have a look here:
    http://htk.eng.cam.ac.uk/

    I've used it in the past. It's a bit hard to use, but the results are decent.

    What you have to realize is that you will need to have _very_ clean recordings,
    or else the recognition rate will suffer greatly.

  23. the command line by ciaran_o_riordan · · Score: 4, Informative

    To play an audio file at 60% normal speed:

    mplayer -af scaletempo=scale=0.6 the_file.ogg

    And then to check the transcript, change the 0.6 to 1.5 (or 2.0 for someone like Richard Stallman who speaks slowly and clearly).

    1. Re:the command line by nbauman · · Score: 1

      Got any tricks to get it to back-pedal 3 seconds?

    2. Re:the command line by bytestorm · · Score: 1

      left arrow is 5 sec by default, iirc. This page from the manual will let you set the seek to whatever you want. http://www.mplayerhq.hu/DOCS/HTML/en/control.html

    3. Re:the command line by worf_mo · · Score: 1

      A while back I was looking for software to play back audio at different speed and pitch, and I ran into play it slowly. It can play every file gstreamer does (even video), and its interface allows to change the options while playing a file. It also allows to loop over certain parts of a file, which can be handy in case the interviewee didn't speak very clearly. It is available under the GPLv3.

    4. Re:the command line by Anonymous Coward · · Score: 0

      MPlayer can change it at runtime, too. (Bound to [] keys by default I think). Just remember to use -af scaletempo to avoid pitch changes.

  24. Transcription by ddillman · · Score: 1

    I wish you luck in your quest as I'm also working on genealogy and would like to be able to do this as well. I'd be interested in hearing if you find something that works acceptably well for this purpose. In my experience (IBM Via Voice from OS/2 v.3 days to Dragon Naturally Speaking 10) the state of the art just isn't ready for general use. Even after training, I always got enough errors to discourage use. And I type relatively quickly, so it was just more effective for me to do it manually.

    --
    Little girls, like butterflies, need no excuse. -- L. Long
  25. No. by Alex+Belits · · Score: 1

    I would really love to be able to run the resulting MP3 or WAV files through some software a get a text file out. I know that software like this exists commercially.

    No. Automated arbitrary speech recognition is an unsolved problem -- all voice recognition systems require speaker to make an effort to pronounce words clearly, or make the number of mistakes that take more effort to fix than to write manually.

    It will make more sense to write a transcription assistance software -- an equivalent to the tape player with a foot pedal commonly used for this purpose, except with capability to play and repeat short sequences of words or phrases, speed adjustment, etc.

    --
    Contrary to the popular belief, there indeed is no God.
    1. Re:No. by Anonymous Coward · · Score: 0

      Please ignore. I accidentally moderated your post as redundant, when I meant insightful. I'm replying so that my moderation is removed.

  26. XTrans from the LDC! by Anonymous Coward · · Score: 0

    Try XTrans from the Linguistic Data Consortium. It's GPL and specifically designed for doing speech transcription. Ask nicely for support, please; the main developer is quite busy.

  27. Got kids? by Kral_Blbec · · Score: 4, Insightful

    Pay them a buck per page and they learn some family history along the way. Problem solved.

    1. Re:Got kids? by Luckyo · · Score: 4, Interesting

      This is one of the cases where journey matters as much if not more then destination :)

    2. Re:Got kids? by tehcyder · · Score: 2, Funny

      Pay them a buck per page and they learn some family history along the way. Problem solved.

      Mummy, why does aunt Bess call grandma a "syphilitic rum-and-cock-addled whore"?
      Daddy, why was great grandpa Ben "shot at dawn for cowardice in the face of the enemy"?
      Mummy and daddy, how come I was born only four months after you married?

      --
      To have a right to do a thing is not at all the same as to be right in doing it
  28. Google Voice? by Facegarden · · Score: 1

    Google has been working on speech to text for years, and they've got Google Voice to where it transcribes your messages to text. Works great with the Android client, and they have a web page. But even with google's experience and money, its not very accurate. It might be better than most of what you'll find though, and its free.

    You could probably rig up Google Voice to where each thing you want to transcribe gets recorded as a "message" to you.

    That said, here's a voicemail I got recently:

    "Hey Jeff, Nate what you can still haven't been able talk to you in. X-rite is and see if you've been found. If off seems like just. I don't know if the E Z the phone software. This is not available 4. Slash number. I wanna malfunction or give us a call back to you now."

    So its not perfect... One funny thing is that my name isn't Jeff or Nate, and neither was the caller.
    -Taylor

    --
    Worldwide Military budgets: $2100 billion. Worldwide Space Exploration budgets: $38 billion. Really, world? Really?
  29. High quality recordings now, transcription later by itamblyn · · Score: 2, Informative

    I think the most important thing to keep in mind for a project like this is that you should do everything you can to ensure a high quality recording. Don't worry about transcription at this point - just focus on getting content. When algorithms (and computers) have improved in 5-10 years time you can do the transcription. It might even be useful to record the sessions with a video camera. Maybe speech recognition tech of the future will use lipreading in addition to the approaches that are used now.

  30. Foot pedals by Anonymous Coward · · Score: 0

    If you are transcribing manually, you really want to consider using something with foot pedals, so you can control the playback with that instead of switching between typing and playback software all the time.
    http://www.nch.com.au/scribe/pedals.html

  31. Easy answer by JiffyPop · · Score: 1

    Just make a call to your favorite terrorist-harboring nation, add in some carefully chosen phrases, and them do an FIA request for them.

  32. Foot Pedal and Express Scribe best option by Adattisi · · Score: 4, Informative

    I've been a transcriptionist for over 5 years, and unless you want to have to retype most of it yourself anyway, don't offer pennies on a site like guru/vworker/elance. A decent transcriptionist is going to charge at least $45-50 per AUDIO hour (not hours it takes) if it's a good, clear recording & a single speaker. If there was a really great product out there, I'd be out of a job. If you want to do it on the cheap, get an inexpensive USB Infinity foot pedal (on ebay) as mentioned before & Express Scribe is a free download to playback & rewind the audio. Both are what I use. Good luck!

  33. Re:On the other hand... by vrmlguy · · Score: 2, Interesting

    I just slice everything up into segments of 60 seconds and let Google Voice transcribe it for me. Sure, some nay-sayers might point out that it's slower that transcribing it all manually, but they don't get that I'm getting Google to do the work for me!

    --
    Nothing for 6-digit uids?
  34. Wont Work by EEPROMS · · Score: 1

    Were I work we have tons of recordings from engineering committees and we tried lots of free and commercial programs but at the end of the day due to the vagueness of the English language the best solution was to "hire a human". So thats what we did, we have found a few people in India who were happy to transcribe our recordings for a fraction of the cost of hiring someone to fix the stuff ups from the speech-text software (also good speech-text software costs a fortune and takes ages to train especially when most of the engineers sentences are full of acronyms). So save time and help those with less money and hire someone, not like we have a global shortage of people.

  35. Your choices are basically humans or the Dragon by mdecerbo · · Score: 1

    Though there are interesting speech recognition products for other applications ; for this task Dragon and IBM ViaVoice, both sold by ScanSoft, are pretty much the only software choices until someone qualified gets an NSF grant to beef up Sphinx.

    I can second the recommendation of the LDC's XTrans if you're going to do this yourself.

    If you want someone else to do it, here are a lot of podcasters who want transcripts, and a bunch of transcription services have sprung up to address the market. They've already implemented a lot of the quality-control mechanisms you'd have to address in order to get good results from something like the Mechnical Turk.

    The Wall Street Journal ran a side-by-side comparison back in 2008 and recommended castingwords.com, but another provider may very well be better by now. Shop around.

  36. Try speakwrite.com by Anonymous Coward · · Score: 0

    Get it back in 3 hours

  37. "Transcriber" is the tool you want by harmonise · · Score: 1

    Transcriber is the tool that you are looking for. It plays the file and you type and annotate. It's in the Ubuntu repositories so I assume it's in Debian's as well.

    --
    Cory Doctorow talking about cloud computing makes as much sense as George W Bush talking about electrical engineering.
    1. Re:"Transcriber" is the tool you want by ndmccab · · Score: 1

      I used to use this. Its quite good.

  38. Google Voice... by RobertM1968 · · Score: 1

    I hear Google has a great tool for this that they use for Google Voice...

    Or... transcribed...

    I'm here googoo, hi a grape too fur this that day fuse far google boys...

  39. Coding Horror article by lulalala · · Score: 2, Informative

    Coding Horror recently posted an article about the current voice recognition technology.

    http://www.codinghorror.com/blog/2010/06/whatever-happened-to-voice-recognition.html

    There is a poem which got transcribed, and the title became like this:

    "a poem by Mike Bliss --> a poem by like myth"

    The rest of the poem is equally funny. So basically you better transcribe it manually.

    1. Re:Coding Horror article by blue+trane · · Score: 1
  40. Wrong question. by BitZtream · · Score: 1

    Your question was phrased wrong.

    Just ask for what you mean, you want free software not so much OSS. Its not like you're going to go editing and fixing bugs in the speech algorithm so the openness here really is just a guise to get something for free.

    You'll find plenty of no-cost ways to transcribe, but OSS options fall short.

    Reality of it is, you'll save yourself a lot of effort if you just type it yourself. It'll be faster and far more accurate.

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  41. google voice by pixelite · · Score: 1

    Why don't you just use google voice to transcribe? Google Voice has a feature to transcribe your voice mails, not sure how long each message can be but maybe you can automate it somehow?

    --
    >>Sig under construction
  42. I wrote a little application for that... by Anonymous Coward · · Score: 1, Interesting

    I looked into automatic transcription software too. I think the consensus is that none of it works well unless it is trained, and trying to "train" software with regular recordings of conversations is not likely to work.

    I wrote my own little application so that I could type the text in myself. It works with WinAmp so its tied to windows (Sorry! Time constraints...) From my web page:

    http://csclub.uwaterloo.ca/~jg3macka/GabbleFarb/index.html

    What it is:

    GabbleFarb is basically a glorified notepad application that works with WinAmp (a free audio and video player). A number of hotkey combinations exist to control WinAmp from inside GabbleFarb. As a transcriber, this allows you to easily pause, rewind, fast-forward and control volume levels without leaving the editor. Additionally, as a video or audio file is playing in WinAmp whenever the ENTER key is pressed GabbleFarb will begin the next line with a timestamp of the current playing time. Within the editor, you can then double-click on a line of text in your transcript and GabbleFarb will automatically tell WinAmp to start playback at that point in the file.

  43. Accuracy by Anonymous Coward · · Score: 0

    The decision of automatic vs manual depends on whats the accuracy you want. Automatic will can go upto 75% to 80%.The best way to use automatic transcription would be to train your PC's speech recognition, play the file with headphones, speak it out loud yourself. Again, there are a lot of contextual information which cannot be transcribed accurately by a computer. So you'll have to manually edit these files if you want to take it to 100%.

    You can also manually transcribe it yourself. If you have typing speed around 80wpm then an hour of audio will take around 4 hours to do. Have a look at NCH ExpressScribe. Its a free play/stop software which is almost de-facto standard in the transcription industry.

    You can also use various transcription services which are out there. A professional transcription service will charge you around $1 to $2 per audio hour. Freelancers will charge around half of that. But then with freelancers you cannot guarantee the quality.

    Shameless Plug: We provide a transcription service for $0.75 per minute of audio. http://callgraph.biz

  44. A-AI by Anonymous Coward · · Score: 0

    Artificial Artificial Intelligence. IOW, Farm it out to piece workers on the net for pennies on the Amazon Turk project.

  45. Re:On the other hand... by quickOnTheUptake · · Score: 1

    Informative?
    Attention slashdotters, There is at least one retard on the loose. He may be calling himself and playing tapes into the phone. If you encounter him do not engage him as he is armed with modpoints and may use them erratically.

    --
    Mod points: Guaranteed to remove your sense of humor.
    Side effects may include gullibility and temporary retardation
  46. State of Speech Reco by poor_boi · · Score: 3, Informative

    It's been my job to work with speech recognition technology for the last 10 years. I've worked with speaker-independent grammar-based recognizers like Nuance Recognizer. I've worked with speaker-dependent training-based recognizers like Dragon Naturally Speaking. I've used open source recognizers like Sphinx. I've even dabbled with writing my own basic recognition engine. I can tell you with confidence: with the current state of commercial/open-source technology, you will not be able to get satisfactory results transcribing two speakers in the same recording. Accurate machine transcription requires training and single-speaker. I have heard people claim that speech recognition is a dead technology because it has stopped improving at appreciable speeds. While improvements have slowed down drastically, I do not believe speech recognition is dead by any means. We've really been making the same steady progress since the inception of speech recognition -- but previously we were riding the wave of geometric (sometimes exponential) growth in CPU clock rate. Now that the free lunch is gone, recognition algorithms need to be parallelized to once again ride improvements in CPU design.

    1. Re:State of Speech Reco by ablincolnsbeard · · Score: 1

      well said!

    2. Re:State of Speech Reco by TheTurtlesMoves · · Score: 1

      How does Sphinx stack up to the rest?

      --
      The Grey Goo disaster happened 3 billion years ago. This rock is covered in self replicating machines!
  47. Use Google Voice by got2liv4him · · Score: 1

    record what they are saying into a voice mail on google voice...

    --
    King of kings and Lord of lords
  48. Human transcription: Cheaper then you'd guess. by spinkham · · Score: 1

    The only good transcription software still runs on wetware.

    Luckily, humans are cheap and easily available.
    Casting words is one of the cheapest ways get humans to transcribe your content.

    http://castingwords.com/

    If you'd like to save a few bucks by cutting out the middleman, see an even cheaper way here:

    http://waxy.org/2008/09/audio_transcription_with_mechanical_turk/

    --
    Blessed are the pessimists, for they have made backups.
  49. craiglist by cynyr · · Score: 1

    post an ad on craigslist that you are paying $20/hour of recording to have it typed out. Pizza provided as well. ByoB. bet some college kid takes you up on it.

    --
    All of the above was encrypted with a Quad ROT-13 method. Unauthorized decryption is in violation of the DMCA.
  50. Looking at it the wrong way by Anonymous Coward · · Score: 0

    Get the lazy bastards to type it out in the first place.
    All that screwing around when you could just be typing it, I see doctors do it too...... blah..blah...blah... shut your trap.... here's a keyboard imbecile.

  51. The bottom line? by Anonymous Coward · · Score: 0

    After reading the replies it appears that open source doesn't have anything worthwhile in this area too.

    What the fuck does that bunch of 2nd rate fanbois have to offer?

  52. Doing it yourself... by Cruciform · · Score: 2, Interesting

    When I did some medical transcription a couple of years ago it was up to me to do it myself, and I didn't find anything open source at the time.
    So I loaded up Amarok, configured global hotkeys to pause and jump forward and backward in the audio file in five second gap, and then loaded up a word processor.
    Sure, it's not automatic, but it helped me get the job done.

    It took me 3 to 4 hours to transcribe each spoken hour of a group of strangers. When the subjects have familiar speech patterns or it's an individual I found progress was much faster.

  53. BSA by Anonymous Coward · · Score: 0

    Considering that Slashdot is now running ads for the BSA, I figure we need to go all buy overpriced crap.

    Man, the other day the BSA -- they tried to kiss me, man. Then they turned around and tried to fuck me up the ass.

  54. Should everything be free? by Anonymous Coward · · Score: 0

    Why does everyone expect there to be a free software alternative for everything imaginable? Usually when you want something (other than digital goods) you have to actually pay money for it.

    1. Re:Should everything be free? by Corwn+of+Amber · · Score: 1

      Why pay for things that have no replication/distribution costs? Ideas are free. Production costs for software? Up-front, then zero.

      Yes, all software should be free. Adapting it to special needs should cost money.

      Do you know that commercial software makes up 10% of software development? The rest is in-house. Maintaining, rewriting, expanding specialized in-house software. And adapting open-source to real-world.

      --
      Making laws based on opinions that stem up from false informations leads to witch hunts.
  55. Pay Amazon Turk to "crowdsource" it by Monkier · · Score: 1

    Here's someone who has already done it..
    http://waxy.org/2008/09/audio_transcription_with_mechanical_turk/

    Split up the audio into 5 min pieces.
    Set up a template on Amazon Turk for'workers' to grab the 5 min mp3 files, and pay them $2 for each file translated.

    More info in the comments. http://www.audiobookcutter.com/ is capable of chopping up the file at the silences for you.

  56. Re:On the other hand... by Verteiron · · Score: 4, Funny

    I'm sure you roar get ding fan plastic results from goo gull boys, two eye find it variably hell full.

    --
    End of lesson. You may press the button.
  57. No chance by yes+it+is · · Score: 1

    As others have said, you can not get accurate speech recognition for multiple speakers. Even for the best of breed closed source software (Dragon) you also need to have good control over microphone quality and placement, and the technique in this instance is to shadow the speakers (put them on headphones and speak into the microphone). transcript.el will remove some of the pain points for transcribing for you if you're happy using emacs. It works out as cost/time effective - I reckon it takes transcription time from 5-8x the length of the recording to something like 2-3x the length, but at this point in time you're not going to find a satisfactory open source solution to machine transcription, either shadowed, or from live tapes.

  58. Transcription Software Written in VB6 okay? by Anonymous Coward · · Score: 0

    Several years ago, I wrote a general-purpose media player in VB6. Would you like the source for that?
    http://www.vsubhash.com/article.asp?id=15&info=Subhash_VCDPlayer#open_source_development

    It has a special transcription mode.
    http://www.vsubhash.com/article.asp?id=15&info=Subhash_VCDPlayer#transcription

  59. PRAAT by jpkunst · · Score: 1

    Transcribe manually, using a transcription program like PRAAT.

  60. Windows Media Player With Transcription Hot Keys by Grippen · · Score: 1

    Some years ago, I wrote a general media player software in VB6. It is a shell around Windows Media Player. http://www.vsubhash.com/article.asp?id=15&info=Subhash_VCDPlayer#transcription As I was working as a transcriptionist at that time, I added a special transcription mode feature to the player. Thanks to hot keys, you will not need need a footpedal to control the playback. The player window also stays above all windows. On the site, I had offered to give the source code for OSS development but did not give it away as a download. I will go home today and upload the VB6 source code as a free download.

  61. pocketsphinx by chandanadesilva · · Score: 1

    Suggest you look at pocketsphinx, it is a front end on Sphinx, and includes Sphinx. http://cmusphinx.sourceforge.net/2010/03/pocketsphinx-0-6-release/

  62. Transana by paugq · · Score: 2, Interesting

    It's not what you are asking for, but it sure will help you: Transana

  63. vi by xmorg · · Score: 2, Funny

    open up vi, press i, (or a), and press play on the audio device.
    Type out whatever you hear.

    Problem solved. :wq

  64. MOD PARENT DOWN. by Karganeth · · Score: 1

    Seriously, transcribe it manually... automatic speech recognition just doesn't work.

    That view is only ever held by people who have not used speech to text software in a very long time. Speech recognition software today works INCREDIBLY well. I would say that it was over 95% accurate (and I have a very cheap microphone). Here's a video of it in action http://www.youtube.com/watch?v=bsohqUgjqK0&feature=related It frustrates me that people think that speech recognition hasn't progressed since they last used it and therefore their opinion of it from when they used it years ago is still valid. Speech recognition works INCREDIBLY well, for me it's more accurate than typing.

  65. Jeff Atwood has an interesting article by Anonymous Coward · · Score: 0

    You might want to read up on Jeff Atwood's post: http://www.codinghorror.com/blog/2010/06/whatever-happened-to-voice-recognition.html

  66. Pointers for good NLP info? by Anonymous Coward · · Score: 0

    I've Googled around a bit, but I only seem to find shallow babble. Could you point me in the right direction about the good stuff on NLP?

    1. Re:Pointers for good NLP info? by Corwn+of+Amber · · Score: 1

      NLP is a crackpot pseudo-science field. Forget that.

      --
      Making laws based on opinions that stem up from false informations leads to witch hunts.
  67. Court reporters ... by ElmoGonzo · · Score: 1

    If there's anyone who should be concerned with creating an accurate text record of spoken words, it would be a court reporter. The ones I know tend to use a belt & suspenders approach; they keep a recorder running to capture the audio while they stenographically record the words as they hear them. The written transcript starts from the steno and gets proofread while listening to the tape. You would be surprised the number of places where what the reporter heard doesn't match what the proofreader hears on the tape. That being the case, if you were to run the audio through something like Dragon Naturally Speaking, you would still need to verify what is in the text.

  68. The Minimum Wage Solution by Anonymous Coward · · Score: 0

    A while back I was looking to get some dictation notes on a novel transcribed, and the best I found (after playing with Dragon Naturally Speaking and a few others) was to simply pay some broke college student a nominal fee per audio hour to transcribe them. Its not a professional level Im sure, and I had to go back and do some paragraph formatting.
     
    But the bottom line is that for getting the words reliably on the screen, the human brain is still the best solution out there.

  69. Trick: re-speak it yourself by oergiR · · Score: 2, Interesting
    I'm doing my PhD on speech recognition. I think (and hope!) it's neither dead nor fully developed. Currently, changes of environment screw speech recognisers up. Different speakers, background noise... A trick that I heard has been used for subtitling television broadcasts is to have someone re-speak the words (which is not that hard). You could play the audio recordings on your headphones while repeating them into a microphone. If you're in a quiet room and the recogniser is trained on your voice, that may get you most of the way. You'll still want to correct transcriptions manually.

    I don't know of any good trained open-source speech recognisers. There are open-source back-ends like Sphinx or HTK (which I sort of work on) but you need massive transcribed training corpora to train a speech recogniser. This is expensive which I guess is why open-source speech recognition hasn't taken off. In the speech recognition group at my university, most people use Linux, and I don't think anyone actually uses a speech recogniser in their daily work.

  70. Re:On the other hand... by mangaskahn · · Score: 1
    There's something quite Zen about that post.

    eye find it variably hell full.

    Indeed.

    --
    Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect.--Linus Torvalds
  71. Dragon NaturallySpeaking by Futurepower(R) · · Score: 1

    He means Dragon NaturallySpeaking. It is claimed that it is "Up to 99% Accurate". "Up to" means "0% to".

    Even if Dragon NaturallySpeaking is 99% accurate, that last 1% is a problem to correct. The software will never make a mistake in spelling. However, it will sometimes substitute similar words that change the meaning of what you intended to say, sometimes in subtle ways.

    Dragon NaturallySpeaking has improved a lot since version 7. I don't know whether there were improvements in the recognition engine since version 8.

    Sometimes Version 10 Standard is sold at Fry's with rebates that make the total cost $25. However, only the Preferred and more expensive versions allow you to dictate into a handheld recorder for later transcription.

  72. I wrote an app to help transcribe audio. by Anonymous Coward · · Score: 0

    Check out this application I wrote 10 years ago to transcribe some family history audio. It saved me many hours of time.

    http://www.wiedenhof.nl/ul/dictplay.htm

  73. Dragon technology is in fact multi-platform by llamafirst · · Score: 1

    Dragon is not open source. It is not even multi-platform.

    What? Their technology is on multiple platforms and trivially confirmed with google in seconds with queries like: dragon speech mac

    WINDOWS: http://www.nuance.com/naturallyspeaking/products/editions/default.asp

    MAC: http://www.nuance.com/naturallyspeaking/products/macintosh/for-the-mac.asp

    iPhone/iPad: time-limited note recording, but impressive accuracy : http://www.dragonmobileapps.com/

    Phone via calling like, as a regular phone: http://jott.com/

    1. Re:Dragon technology is in fact multi-platform by llamafirst · · Score: 1

      Dragon is not open source. It is not even multi-platform.

      What? Their technology is on multiple platforms and trivially confirmed with google in seconds with queries like: dragon speech mac

      WINDOWS: http://www.nuance.com/naturallyspeaking/products/editions/default.asp

      MAC: http://www.nuance.com/naturallyspeaking/products/macintosh/for-the-mac.asp

      iPhone/iPad: time-limited note recording, but impressive accuracy : http://www.dragonmobileapps.com/

      Phone via calling like, as a regular phone: http://jott.com/

      Also...

      Blackberry: http://appworld.blackberry.com/webstore/content/8108

    2. Re:Dragon technology is in fact multi-platform by markdavis · · Score: 1

      Well, I did check out the Wikipedia page mentioned and it was MS-Windows only. But I didn't research much further than that. Sorry, I stand corrected.... Dragon is not open source, it *is* multiplatform, and is not available for Linux.

  74. Perfect Solution by Muad'Dave · · Score: 1

    1) Make a call to a random barber shop in Iran or Afghanistan.
    2) Say "Al Qaeda", "terrorist", and "spy" very clearly.
    3) Play the tapes of your family members' stories.
    4) Get a copy of the transcripts from your lawyer at your espionage trial.

    5) profit?

    --
    Tiller's Rule: Never use a word in written form that you've only heard and never read. You will end up looking foolish.
  75. Amazon Mechanical Turk? by Bodhammer · · Score: 1

    This might be cheap for actual human transcription: https://www.mturk.com/mturk/welcome

    --
    "I say we take off, nuke the site from orbit. It's the only way to be sure."
  76. Re:On the other hand... by makoto149 · · Score: 1

    I believe it's an exact literal translation from the Chinese for the phrase, "If at first you don't succeed, try try again." frickin' Google Translator...

  77. Free Solutions by crazyl3gs · · Score: 1

    NCH Express Scribe is freely available to the public for commercial and residential use. I know many transcription companies and education systems use it. It works with most foot pedals as well. A good cheap foot pedal is the vPedal found at several online locations. Transcription software is pretty easy to write. I have written several transcription programs in the past with ease. For those of you who will ask, I do not work for NCH or any associated companies.

  78. Google seems to have it down. by slashfoxi · · Score: 1

    My Android phone transcribes search terms from voice input. I wonder if there's a way to use Google's voice transcription servers. No idea how, but it seems like Google has already done the hard work.

  79. Don't know answer, but my .02 - DRAGON, & Win7 by DRAGONWEEZEL · · Score: 1

    Dragon Naturally speaking! Windows 7 has dictation built in, and MAC does too I believe. All transcription / dictation software will require training for anything close to 98% output.

    I'm not sure about the OSS options, but Dragon is now pretty cheap, at just $80. You can't beat that with a stick for the quality you get. Win7 has it built in, try it out if you have a box w/ it. I was actually pleasantly suprised.

    The fastest cheapest option however, would be to pay a college kid to transcribe it. You can do that for $8/ per hr, and if you edit the waves, chopping silence, etc, you can get someone to pretty much break down an hour of speach per man hour. Especially if you can rig up a way to speed up the waves a bit (people can talk fast, but usually don't talk as fast as a good transcriptionist can type. To pick up the word of someone mumbling, the transcriptionist will have to back up, listen, and try and work out the word. This will slow things down. If you can clarify those words, let them know (at 3:15 uncle jimmy says CHICKENFLUFFER not CHICKENpuffer)

    The thing is, by the time you get done with this article, reading everyones solutions, etc... you will have been able to transcribe much of that data yourself. (depending on family size I guess...)

    --
    How much is your data worth? Back it up now.
  80. coach outlet by coachoutlet · · Score: 1