Slashdot Mirror


Recorded Speech to Text Software?

shfted! asks: "Recently, I've been given the task of transcribing several dozen audio tapes of interviews to typed word, that is, listening for 10 seconds, write what was said, repeat. At around 4 hours per hour long tape, I would like to automate the process somehow. Recording the tape into the computer is no problem, but I need some software that will do the speech recognition accurately more than quickly -- several hours per tape is not an issue (I have access to several machines running 24/7). I will still have to go over the computer's work to correct any mistakes. A free solution for Linux would be best, non-free and Windows solutions are okay, but a working solution is highest priority. Can anyone point me in the right direction(s)?"

5 of 66 comments (clear)

  1. Lo tek is the way to go in this instance by Txiasaeia · · Score: 4, Interesting
    Several hours per tape is acceptable? Well, if you can do one tape in four hours, then two people can do one tape in two hours. In other words, hire a college student at minimum wage for a contract position (I.e. until the tapes are transcribed) and go to it.

    It's cost effective, as fast as you need it to be and best of all more accurate than any software solution to date. Most software packages are still at only about 90% accuracy, so that's still 24 minutes per four hour tape that you'll need to correct, and you'll still probably have to listen to the whole thing over again in order to verify the accuracy of any software program.

    --
    Condemnant quod non intellegunt.
    1. Re:Lo tek is the way to go in this instance by shfted! · · Score: 3, Interesting

      Actually, I am a college student hired to transcode these tapes at $40 CAN a tape, which at 4 hours a tape is just a little above minimum wage where I live. I want to make more than minimum wage, thus my desire to automate things somewhat :) Again, my intent was to have the machine do the first pass, then I could listen and correct errors as I went. Why? I can type continuously at about 70 wpm, but people speak around 150 to 200. However, if I have a 90% accurate copy, that means I only need to type 15 to 20 wpm to keep up, correcting on a single pass, thus reducing my time per tape to the duration of the tape.

      --
      He who laughs last is stuck in a time dilation bubble.
  2. Existing software by skinfitz · · Score: 2, Interesting

    What about simply plugging the tape into a system running Dragon Naturally Speaking or IBM ViaVoice?

    From the Dragon page:
    True Continuous Speech - Speak to your computer naturally and at a normal pace--without pausing between words. Your spoken words swiftly appear on your computer screen.

    1. Re:Existing software by lambent · · Score: 2, Interesting

      The problem is that you have to train the software for your voice before you can obtain any high degree of accuracy. Not possible with pre-recorded speech.

      Hell, the new voice-mail voice activated menus that have been popping up when i dial customer service sometimes force me to say out my phone number. And even to do that accurately, I have to speak very slowly and quite loud. More ofen, I just press random buttons until I get dumped to a live operator. (Try it, it works!)

  3. Re:Not really, the technology isn't there by cognibrain · · Score: 2, Interesting

    To do it in general for random speakers though?

    The state of the art for arbitrary news broadcasts is about a 20% word error rate. While this isn't good enough for the poster's needs, it turns out to be almost good enough for indexing.

    Wonder when we'll start seing Google return audio and video along with text documents? There's a research project demo of this happening here.