Open Source Transcription Software?

← Back to Stories (view on slashdot.org)

Open Source Transcription Software?

Posted by kdawson on Tuesday July 20, 2010 @10:49AM from the what-he-said dept.

sshirley writes "I am beginning to do some interviews with family members and will do some audio journals for genealogy purposes. I would really love to be able to run the resulting MP3 or WAV files through some software a get a text file out. I know that software like this exists commercially. But does this exist in the open source world?"

16 of 221 comments (clear)

Min score:

Reason:

Sort:

Dear aunt, by Anonymous Coward · 2010-07-20 10:53 · Score: 5, Insightful

let's set so double the killer delete select all.
Seriously, transcribe it manually... automatic speech recognition just doesn't work. And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.
1. Re:Dear aunt, by Kenoli · 2010-07-20 11:07 · Score: 4, Insightful
  
  A program capable of "making informed guesses based on context" seems perfectly plausible, though that's not part of speech recognition per se.
2. Re:Dear aunt, by painandgreed · 2010-07-20 12:04 · Score: 4, Informative
  
  Seriously, transcribe it manually... automatic speech recognition just doesn't work. And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.
  Funny, considering my job is training doctors to use voice recognition to do all their reporting. Actually, it works fairly well. I also don't mean dictating something that goes to transcriptionists. The doctors dictate the report. The dictation is transcribed into text. They review it and sign off. We got rid of all our transcriptionists years ago. The time for a report to get done went from 24 hours with transcriptionists to 24 minutes with voice recognition. The amount of errors was cut in half. The doctor's work load was also lessened as they could check the final version while still dealing with the data rather than having to go back and review everything all over again a day or two later. Speech recognition was a problem seven years ago, but hardly at all in the last five or so. Yes, the have to go over their dictations and occasionally make some minor corrections. There's always background noise to worry about and some people's accents are hard even for another person to get through, but for things that require quick turn around and need to be verified by the person who is doing it, voice recognition already is the gold standard.
  PS several of the doctors like it so well they bought Dragon (pretty much everybody but Phillips use Dragon for their speech engine) for home and use it there for all their email and other writing.
3. Re:Dear aunt, by BitZtream · 2010-07-20 14:16 · Score: 4, Interesting
  
  Ironically, I have a family member he runs a business doing transcription for doctors ... because every time the try voice recognition software they get pissed off and go back to real people.
  Being a fan of Dragon Dictate myself, I know its not that great and I know it has a fit when you start throwing accents at it, training or not.
  I call bullshit on your claims of using Dragon for everything.
  
  --
  Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Sphinx by SaXisT4LiF · 2010-07-20 10:54 · Score: 5, Informative

Carnegie Mellon has an open source speech recognition project you might want to look into. Sphinx

--
Fight or flight its all the same
Live to die another day

--Ryan
But Windows Speech Recognition... by Monkeedude1212 · 2010-07-20 11:11 · Score: 4, Informative

Most Windows Vista or Win7 machines come with a built in transcribing feature, that you can enable in the control panel (Win7, under ease of access, Speech recognition).
However - the only way it works properly is if you train it to understand you personally. You load your profile, and it'll run you through a whole bunch of test sentences. The FULL test takes you about 20 minutes I think (It's been a while since I've used it) - and actually works quite well. There is a cut off point at about 2 and a half minutes if you want to stop and try it out. It actually makes it keyboard and mouseless if you want. When you open a browser it highlights everything on the web page thats clickable and assigns it a number, and you simply say "Click 7" and it hits the reply button for you. Then you talk when the textbox has focus and it'll transcribe every word you say.
I did this for my girlfriend's paper once, I read it aloud (you have to mention things like comma, end paragraph, etc) and put it into a Word document. Out of a 15 page single spaced Essay - it got 3 sentences wrong - and that's only because I was mentioning some of the more Obscure greek names (she's a history major). It managed to get full sentences regarding Octavia and her fondness of libraries without error, which I thought was odd since thats not a name you hear every day.
Anyways - if he wants to do this, he should record the test phrases (there will be a lot though) and have each of his interviewees read the test sentences so he can then relay those through the computer and train the computer for each person.
All in all - he may still run across a few errors, but its not nearly as bad as say Google Voice Mail, which tries to figure out what you're saying without having any previous knowledge on how that person speaks. Windows Speech Recognition is something that will handle what he's after though.
Re:I've wondered about this too by Enuratique · 2010-07-20 11:16 · Score: 4, Informative

Google relies on Twilio for their audio transcription.

--
A black hole is where God divided by 0
I looked, but still do it manually by ciaran_o_riordan · 2010-07-20 11:25 · Score: 4, Informative

I've worked on loooads of transcripts. I did most of these:
* http://wiki.fsfe.org/Transcripts
The best technique I've found is to have mplayer play the audio at 60% normal speed and have a text editor (emacs is my preference) in another window, flick between them with alt-TAB and hit Space to start and pause mplayer.

--
Expert in software patents or patent law? Contribute to the ESP wiki!
Re:CMU Sphinx by Narksos · 2010-07-20 11:31 · Score: 5, Informative

What you want is dictation software. I just (last week) spent significant time looking in to this.

For open source you have two main options: CMU Sphinx and Julius/Julian. Both options are just back-ends, you'll have to write a front-end. However, it shouldn't be too hard to do that (the source for the CMU Sphinx demos show how to get input from a mic/wav file (if you've got something other than PCM you'll just need to convert it) and set up various engines.

CMU Sphinx appears to be mainly for research purposes. You can run it in a few different modes: one with a fixed grammar (for command systems, Gnome's voice control uses sphinx in this mode), one (what you'd be looking for) uses a weighted dictionary. I didn't train it to my voice (and you wont be able to train it for transcriptions) and I was getting fairly lousy recognition rates with my $20 Logitech USB Microphone. It might work better with a high quality headset, but I imagine you wont both be wearing one.

Julius/Julian lacks a good acoustic model for English. VoxForge is working on one, but it isn't anywhere near complete.

Here is a good article that sums up the current projects
the command line by ciaran_o_riordan · 2010-07-20 11:37 · Score: 4, Informative

To play an audio file at 60% normal speed:
mplayer -af scaletempo=scale=0.6 the_file.ogg
And then to check the transcript, change the 0.6 to 1.5 (or 2.0 for someone like Richard Stallman who speaks slowly and clearly).

--
Expert in software patents or patent law? Contribute to the ESP wiki!
Got kids? by Kral_Blbec · 2010-07-20 11:52 · Score: 4, Insightful

Pay them a buck per page and they learn some family history along the way. Problem solved.
1. Re:Got kids? by Luckyo · 2010-07-20 12:27 · Score: 4, Interesting
  
  This is one of the cases where journey matters as much if not more then destination :)
Foot Pedal and Express Scribe best option by Adattisi · 2010-07-20 12:02 · Score: 4, Informative

I've been a transcriptionist for over 5 years, and unless you want to have to retype most of it yourself anyway, don't offer pennies on a site like guru/vworker/elance. A decent transcriptionist is going to charge at least $45-50 per AUDIO hour (not hours it takes) if it's a good, clear recording & a single speaker. If there was a really great product out there, I'd be out of a job. If you want to do it on the cheap, get an inexpensive USB Infinity foot pedal (on ebay) as mentioned before & Express Scribe is a free download to playback & rewind the audio. Both are what I use. Good luck!
Re:On the other hand... by Verteiron · 2010-07-20 15:38 · Score: 4, Funny

I'm sure you roar get ding fan plastic results from goo gull boys, two eye find it variably hell full.

--
End of lesson. You may press the button.
Re:CMU Sphinx by Bacon+Bits · 2010-07-20 19:02 · Score: 5, Informative

Both options are just back-ends, you'll have to write a front-end. However, it shouldn't be too hard to do that
"... unless you're not a programmer."
Seriously, since when is "program it yourself" a solution to "are there any open source software packages that do what I want?"? The answer you're looking for is "no". That's the correct answer.
Here, here's a nice car analogy since we're on Slashdot: when you need a car do you buy a kit car, or do you buy one factory built? This is like telling someone who wants a car to drive to work that they should simply buy Chevy big block engine and build the rest from scratch. Just because I need a car doesn't mean I must be an automotive engineer and metal fabricator. Similarly, just because I need dictation software doesn't make me a software architect or a linguist. Directing this person to program their own software is not answering the question.
Cripes. People wonder where the "open source is only free if your time has no value" line came from.

--
The road to tyranny has always been paved with claims of necessity.
Re:CMU Sphinx by Crudely_Indecent · 2010-07-21 01:28 · Score: 4, Insightful

"... unless you're not a programmer."
I am a programmer, but we're all sometimes out of our element.
I found need for modifications to an open source application a few years ago. Rather than spend my time reading the source code to understand how the application worked, I decided to contact the developer. A few emails and a couple of days later, the project developer made the modifications for me and $500 for himself. The world then gained additional functionality in the open source application - everyone wins.
Some people forget, this is how many open source applications survive.
Your analogy is outlandish! If someone wants to drive a car to work, they buy a car. If they want a shark fin on the roof, they go to a custom body shop. If they want a killer stereo, they go to a stereo shop. If they want it to be pink and yellow like yours, they go to a paint and body shop. If they can do these things on their own, they'll do it. The difference being that if the car was open source, doing these things wouldn't void the warranty.
"Open-source is free only if your time has no value." - Jamie Zawinski
I offer an alternative viewpoint:
Open source is free if you truly understand freedom.
I'm free to use the application. I'm free to modify it. I'm also free to recognize my limitations and pay someone else to do these things for me.

--

"Lame" - Galaxar