Open Source Transcription Software?

← Back to Stories (view on slashdot.org)

Open Source Transcription Software?

Posted by kdawson on Tuesday July 20, 2010 @10:49AM from the what-he-said dept.

sshirley writes "I am beginning to do some interviews with family members and will do some audio journals for genealogy purposes. I would really love to be able to run the resulting MP3 or WAV files through some software a get a text file out. I know that software like this exists commercially. But does this exist in the open source world?"

4 of 221 comments (clear)

Min score:

Reason:

Sort:

Unfortunately... by dmneoblade · 2010-07-20 10:55 · Score: 3, Interesting

I spent several month searching for something like this. Open-source voice recognition is in really infant stages, and there does not seem to be much interested in improving the few things we have.

--
Warning, knife is sharp. Please keep out of children.
Re:Got kids? by Luckyo · 2010-07-20 12:27 · Score: 4, Interesting

This is one of the cases where journey matters as much if not more then destination :)
Re:Dear aunt, by BitZtream · 2010-07-20 14:16 · Score: 4, Interesting

Ironically, I have a family member he runs a business doing transcription for doctors ... because every time the try voice recognition software they get pissed off and go back to real people.
Being a fan of Dragon Dictate myself, I know its not that great and I know it has a fit when you start throwing accents at it, training or not.
I call bullshit on your claims of using Dragon for everything.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Re:CMU Sphinx by inkyblue2 · 2010-07-20 15:17 · Score: 3, Interesting

Sphinx by itself is a terrible answer to this problem, unfortunately. The code is free, but good luck finding an appropriate model. Worse, you'll need to train a speaker-dependent model to get any usable results, and this is a VERY non-trivial task with Sphinx tools in the state that they are. I spent several years getting paid to adapt Sphinx for commercial purposes and while it's great for some things, I can say with confidence that it is not the tool you're looking for.
You know what works? Dragon. Hate to say it, but the commercial products here have a gigantic edge on the competition.
That said, I'd love to see someone come up with an open source speaker-dependent model training system that's friendly enough for app developers (not speech researchers) to roll into projects. I think this is a big open door for contribution to the community. Sphinx isn't the best thing going, but it's certainly usable, and if a real product came into being I'm sure all the speech wonks would start coming out of the woodwork to improve the algorithms.