Open Source Transcription Software?
sshirley writes "I am beginning to do some interviews with family members and will do some audio journals for genealogy purposes. I would really love to be able to run the resulting MP3 or WAV files through some software a get a text file out. I know that software like this exists commercially. But does this exist in the open source world?"
Looks active.
let's set so double the killer delete select all.
Seriously, transcribe it manually... automatic speech recognition just doesn't work. And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.
Carnegie Mellon has an open source speech recognition project you might want to look into. Sphinx
Fight or flight its all the same
Live to die another day
--Ryan
I spent several month searching for something like this. Open-source voice recognition is in really infant stages, and there does not seem to be much interested in improving the few things we have.
Warning, knife is sharp. Please keep out of children.
just upload it to youtube, its genius google transcription technology will make everything sense out of it.
Most Windows Vista or Win7 machines come with a built in transcribing feature, that you can enable in the control panel (Win7, under ease of access, Speech recognition).
However - the only way it works properly is if you train it to understand you personally. You load your profile, and it'll run you through a whole bunch of test sentences. The FULL test takes you about 20 minutes I think (It's been a while since I've used it) - and actually works quite well. There is a cut off point at about 2 and a half minutes if you want to stop and try it out. It actually makes it keyboard and mouseless if you want. When you open a browser it highlights everything on the web page thats clickable and assigns it a number, and you simply say "Click 7" and it hits the reply button for you. Then you talk when the textbox has focus and it'll transcribe every word you say.
I did this for my girlfriend's paper once, I read it aloud (you have to mention things like comma, end paragraph, etc) and put it into a Word document. Out of a 15 page single spaced Essay - it got 3 sentences wrong - and that's only because I was mentioning some of the more Obscure greek names (she's a history major). It managed to get full sentences regarding Octavia and her fondness of libraries without error, which I thought was odd since thats not a name you hear every day.
Anyways - if he wants to do this, he should record the test phrases (there will be a lot though) and have each of his interviewees read the test sentences so he can then relay those through the computer and train the computer for each person.
All in all - he may still run across a few errors, but its not nearly as bad as say Google Voice Mail, which tries to figure out what you're saying without having any previous knowledge on how that person speaks. Windows Speech Recognition is something that will handle what he's after though.
Google relies on Twilio for their audio transcription.
A black hole is where God divided by 0
I've worked on loooads of transcripts. I did most of these:
* http://wiki.fsfe.org/Transcripts
The best technique I've found is to have mplayer play the audio at 60% normal speed and have a text editor (emacs is my preference) in another window, flick between them with alt-TAB and hit Space to start and pause mplayer.
Expert in software patents or patent law? Contribute to the ESP wiki!
To play an audio file at 60% normal speed:
mplayer -af scaletempo=scale=0.6 the_file.ogg
And then to check the transcript, change the 0.6 to 1.5 (or 2.0 for someone like Richard Stallman who speaks slowly and clearly).
Expert in software patents or patent law? Contribute to the ESP wiki!
Pay them a buck per page and they learn some family history along the way. Problem solved.
I've been a transcriptionist for over 5 years, and unless you want to have to retype most of it yourself anyway, don't offer pennies on a site like guru/vworker/elance. A decent transcriptionist is going to charge at least $45-50 per AUDIO hour (not hours it takes) if it's a good, clear recording & a single speaker. If there was a really great product out there, I'd be out of a job. If you want to do it on the cheap, get an inexpensive USB Infinity foot pedal (on ebay) as mentioned before & Express Scribe is a free download to playback & rewind the audio. Both are what I use. Good luck!
Ubuntu uses PulseAudio on the ALSA audio subsystem, but that error message indicates XTrans is trying to use the OSS audio subsystem instead. To work around this, try using the Pulse OSS wrapper or temporarily disable Pulse. From the commandline, "padsp xtrans" or "pasuspender xtrans".
F0 07 C7 C8
It's been my job to work with speech recognition technology for the last 10 years. I've worked with speaker-independent grammar-based recognizers like Nuance Recognizer. I've worked with speaker-dependent training-based recognizers like Dragon Naturally Speaking. I've used open source recognizers like Sphinx. I've even dabbled with writing my own basic recognition engine. I can tell you with confidence: with the current state of commercial/open-source technology, you will not be able to get satisfactory results transcribing two speakers in the same recording. Accurate machine transcription requires training and single-speaker. I have heard people claim that speech recognition is a dead technology because it has stopped improving at appreciable speeds. While improvements have slowed down drastically, I do not believe speech recognition is dead by any means. We've really been making the same steady progress since the inception of speech recognition -- but previously we were riding the wave of geometric (sometimes exponential) growth in CPU clock rate. Now that the free lunch is gone, recognition algorithms need to be parallelized to once again ride improvements in CPU design.
I'm sure you roar get ding fan plastic results from goo gull boys, two eye find it variably hell full.
End of lesson. You may press the button.