Recorded Speech to Text Software?
shfted! asks: "Recently, I've been given the task of transcribing several dozen audio tapes of interviews to typed word, that is, listening for 10 seconds, write what was said, repeat. At around 4 hours per hour long tape, I would like to automate the process somehow. Recording the tape into the computer is no problem, but I need some software that will do the speech recognition accurately more than quickly -- several hours per tape is not an issue (I have access to several machines running 24/7). I will still have to go over the computer's work to correct any mistakes. A free solution for Linux would be best, non-free and Windows solutions are okay, but a working solution is highest priority. Can anyone point me in the right direction(s)?"
It's cost effective, as fast as you need it to be and best of all more accurate than any software solution to date. Most software packages are still at only about 90% accuracy, so that's still 24 minutes per four hour tape that you'll need to correct, and you'll still probably have to listen to the whole thing over again in order to verify the accuracy of any software program.
Condemnant quod non intellegunt.
The technology to do this isn't really there. If the machine can learn how you speak, it can do it. If you limit yourself to just a few words (1000 perhaps?) it is easier. To do it in general for random speakers though?
The problem is people are too varied. I have trouble understanding people from the "deep south". The accent is too think for my ears. I'm sure they have the same problem with my accent.
That isn't to say don't try it, but don't get your hopes up. Vocie recignition is hard, and isn't done well. Just be glad you only have a few to do, my sister's full time job is typing things like that. (most of less interest as she describes it)
Just do the tapes. It will take longer to screw with software setup and cleanup than to just do it. But if you either buy or rig up a foot switch to play/rewind the tape I think it would help. Also I am assuming you are a touch typist. If not then get someone who is to do this job for you.
Years ago, I improved my own typing speed and accuracy by transcribing phone conversations with friends. It just takes some practice.
Of course, if you are listening to this guy, you can disregard my advice.
Give Sphinx a try. It's pretty accurate; especially Sphinx-3. I've used v2 before for a live test, and it works great -- even with different voices.
If your hours of tape are something that has to be transcribed accurately, don't waste your time trying to do it with a computer.
A person who does transcription for a living will do it faster, probably cheaper, and will be able to handle all of the quirks of human speech that will gum up the works of a voice to text program.
There are still places where a machine cannot match the quality of a real live person.
Three Squirrels