Slashdot Mirror


Recorded Speech to Text Software?

shfted! asks: "Recently, I've been given the task of transcribing several dozen audio tapes of interviews to typed word, that is, listening for 10 seconds, write what was said, repeat. At around 4 hours per hour long tape, I would like to automate the process somehow. Recording the tape into the computer is no problem, but I need some software that will do the speech recognition accurately more than quickly -- several hours per tape is not an issue (I have access to several machines running 24/7). I will still have to go over the computer's work to correct any mistakes. A free solution for Linux would be best, non-free and Windows solutions are okay, but a working solution is highest priority. Can anyone point me in the right direction(s)?"

8 of 66 comments (clear)

  1. Simple Suggestion by Anonymous Coward · · Score: 2, Insightful

    Given that half-decent speech recognition is still struggling, might I suggest:

    1) Give your neighbour's kid $10 to transcribe the tape one afternoon
    2) ...
    3) Text!

  2. Not really, the technology isn't there by bluGill · · Score: 3, Insightful

    The technology to do this isn't really there. If the machine can learn how you speak, it can do it. If you limit yourself to just a few words (1000 perhaps?) it is easier. To do it in general for random speakers though?

    The problem is people are too varied. I have trouble understanding people from the "deep south". The accent is too think for my ears. I'm sure they have the same problem with my accent.

    That isn't to say don't try it, but don't get your hopes up. Vocie recignition is hard, and isn't done well. Just be glad you only have a few to do, my sister's full time job is typing things like that. (most of less interest as she describes it)

  3. You must be joking by Radical+Rad · · Score: 3, Insightful

    Just do the tapes. It will take longer to screw with software setup and cleanup than to just do it. But if you either buy or rig up a foot switch to play/rewind the tape I think it would help. Also I am assuming you are a touch typist. If not then get someone who is to do this job for you.

    1. Re:You must be joking by splattertrousers · · Score: 4, Insightful
      If not then get someone who is to do this job for you.

      Court reporters do this kind of thing for a living and some (all?) are contract workers. They can do it in real time and would probably be quite happy to be able to do it all at home rather than in a deposition room or court room. Oh, and their accuracy would be a lot higher than if you did it yourself without checking or if you hired a student to do it.

      Though a tech solution would be cool...

  4. Re:Lo tek is the way to go in this instance by bluGill · · Score: 2, Insightful

    Tapes can be copied on off time. If they are standard audio cassette tapes, then they are not more than 45 minutes per side anyway so you are looking several tapes anyway.

    Even assuming the worst case, 1 tape that is 4 hours long, you can feed the output of the player into the input of a computer, do a ogg (mp3) rip on the stream, and then fast forward to different places. There will be issues merging the copies, but still much less time per person than one person doing the entire thing. (but more work overall if that matters)

  5. Re:Lo tek is the way to go in this instance by Anonymous Coward · · Score: 1, Insightful

    Tapes can be copied on off time.

    They can, but then you're just adding to the total time -- now the students still have to listen to the tapes, and someone has to copy them. Better to just give the students two separate tapes in the first place.

    It was something of a joke, of course.

  6. Hire a professional by rueger · · Score: 4, Insightful

    If your hours of tape are something that has to be transcribed accurately, don't waste your time trying to do it with a computer.

    A person who does transcription for a living will do it faster, probably cheaper, and will be able to handle all of the quirks of human speech that will gum up the works of a voice to text program.

    There are still places where a machine cannot match the quality of a real live person.

  7. Re:Lo tek is the way to go in this instance by Directrix1 · · Score: 2, Insightful

    The tapes are 1 hour long. You didn't even need to read the article to see that. Four hours is how long it takes to use the start/stop method of transcription. Slowing the tape down to 70% speed and never start / stopping would take 1 hour and 26 minutes.

    --
    Occam's razor is the blind faith in the natural selection of least resistance and in universal oversimplification. -- EF