Open Source Transcription Software?

CMU Sphinx by Singularity42 · 2010-07-20 10:53 · Score: 3, Informative

Looks active.

Re:CMU Sphinx by Narksos · 2010-07-20 11:31 · Score: 5, Informative

What you want is dictation software. I just (last week) spent significant time looking in to this.

For open source you have two main options: CMU Sphinx and Julius/Julian. Both options are just back-ends, you'll have to write a front-end. However, it shouldn't be too hard to do that (the source for the CMU Sphinx demos show how to get input from a mic/wav file (if you've got something other than PCM you'll just need to convert it) and set up various engines.

CMU Sphinx appears to be mainly for research purposes. You can run it in a few different modes: one with a fixed grammar (for command systems, Gnome's voice control uses sphinx in this mode), one (what you'd be looking for) uses a weighted dictionary. I didn't train it to my voice (and you wont be able to train it for transcriptions) and I was getting fairly lousy recognition rates with my $20 Logitech USB Microphone. It might work better with a high quality headset, but I imagine you wont both be wearing one.

Julius/Julian lacks a good acoustic model for English. VoxForge is working on one, but it isn't anywhere near complete.

Here is a good article that sums up the current projects
Re:CMU Sphinx by Anonymous Coward · 2010-07-20 12:39 · Score: 2, Informative

Sphinx is what many companies use to get started with, but it's far too raw to be useful by itself. You need to update the HMM back-end extensively... and train it. Even still, your success rate is only 80%... meaning: 1 in 5 words, if spoken slowly, will still be wrong.
Re:CMU Sphinx by notthepainter · 2010-07-20 14:05 · Score: 3, Insightful

Both options are just back-ends, you'll have to write a front-end. However, it shouldn't be too hard to do that

Actually, it can be rather hard to do that. I was one of the founders of MacSpeech and there is a surprisingly large set of details you have to deal with, punctuation, capitalization, etc... Of course since you wouldn't be making a commercial product much of the gloss need not be coded but once you have the engine, the part that takes the audio source and converts it to text, you still have a large amount of work left over.
Re:CMU Sphinx by inkyblue2 · 2010-07-20 15:17 · Score: 3, Interesting

Sphinx by itself is a terrible answer to this problem, unfortunately. The code is free, but good luck finding an appropriate model. Worse, you'll need to train a speaker-dependent model to get any usable results, and this is a VERY non-trivial task with Sphinx tools in the state that they are. I spent several years getting paid to adapt Sphinx for commercial purposes and while it's great for some things, I can say with confidence that it is not the tool you're looking for.
You know what works? Dragon. Hate to say it, but the commercial products here have a gigantic edge on the competition.
That said, I'd love to see someone come up with an open source speaker-dependent model training system that's friendly enough for app developers (not speech researchers) to roll into projects. I think this is a big open door for contribution to the community. Sphinx isn't the best thing going, but it's certainly usable, and if a real product came into being I'm sure all the speech wonks would start coming out of the woodwork to improve the algorithms.
Re:CMU Sphinx by Bacon+Bits · 2010-07-20 19:02 · Score: 5, Informative

Both options are just back-ends, you'll have to write a front-end. However, it shouldn't be too hard to do that
"... unless you're not a programmer."
Seriously, since when is "program it yourself" a solution to "are there any open source software packages that do what I want?"? The answer you're looking for is "no". That's the correct answer.
Here, here's a nice car analogy since we're on Slashdot: when you need a car do you buy a kit car, or do you buy one factory built? This is like telling someone who wants a car to drive to work that they should simply buy Chevy big block engine and build the rest from scratch. Just because I need a car doesn't mean I must be an automotive engineer and metal fabricator. Similarly, just because I need dictation software doesn't make me a software architect or a linguist. Directing this person to program their own software is not answering the question.
Cripes. People wonder where the "open source is only free if your time has no value" line came from.

--
The road to tyranny has always been paved with claims of necessity.
Re:CMU Sphinx by Crudely_Indecent · 2010-07-21 01:28 · Score: 4, Insightful

"... unless you're not a programmer."
I am a programmer, but we're all sometimes out of our element.
I found need for modifications to an open source application a few years ago. Rather than spend my time reading the source code to understand how the application worked, I decided to contact the developer. A few emails and a couple of days later, the project developer made the modifications for me and $500 for himself. The world then gained additional functionality in the open source application - everyone wins.
Some people forget, this is how many open source applications survive.
Your analogy is outlandish! If someone wants to drive a car to work, they buy a car. If they want a shark fin on the roof, they go to a custom body shop. If they want a killer stereo, they go to a stereo shop. If they want it to be pink and yellow like yours, they go to a paint and body shop. If they can do these things on their own, they'll do it. The difference being that if the car was open source, doing these things wouldn't void the warranty.
"Open-source is free only if your time has no value." - Jamie Zawinski
I offer an alternative viewpoint:
Open source is free if you truly understand freedom.
I'm free to use the application. I'm free to modify it. I'm also free to recognize my limitations and pay someone else to do these things for me.

--

"Lame" - Galaxar
Re:CMU Sphinx by Bill,+Shooter+of+Bul · 2010-07-21 09:41 · Score: 2, Insightful

Blechkt. That's how I feel about your post. This is a site for nerds. Nerds are often adept at doing nerdy things. Like writing software.
Now, if you're mom asked you. Then yes, a reply of "You only need to write a front end to this speech engine" is indeed inappropriate.
Your post, and the replies to it, really reflect more on how you view the general slashdot audience, then anything else.

--
Well.. maybe. Or Maybe not. But Definitely not sort of.
Re:CMU Sphinx by Crudely_Indecent · 2010-07-26 08:26 · Score: 2, Insightful

The commercial app does exist, and it's a per-use app that is controlled by a dongle and subscription (hint, more than $500 - plus usage).
Sticking it to the man has nothing to do with it, unless by "it" you mean money and by "the man" you mean my pocket.
Of course, any commercial developer will gladly make a custom app for $, but I guarantee that it will be more than $500. The developer did have plans to add the functionality...eventually. My $500 bought made it happen right now.
It was certainly silly of me to make over $50k using the newly modified software that I paid $500 for. That's only 9900% profit, so, you're absolutely right....I made a serious mistake.

--

"Lame" - Galaxar

Dear aunt, by Anonymous Coward · 2010-07-20 10:53 · Score: 5, Insightful

let's set so double the killer delete select all.

Seriously, transcribe it manually... automatic speech recognition just doesn't work. And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.

Re:Dear aunt, by Kenoli · 2010-07-20 11:07 · Score: 4, Insightful

A program capable of "making informed guesses based on context" seems perfectly plausible, though that's not part of speech recognition per se.
Re:Dear aunt, by ThomConspicuous · 2010-07-20 11:19 · Score: 2, Insightful

It's already being done in medical dictations that are also recorded and double checked by Transcriptionists. Speeds up work flow immensely even with the human verification in place.

I even witnessed an East Indian doctor with a heavy accent dictate normally and have the software pick up everything stated. He was pleasantly surprised.

It works.
Re:Dear aunt, by fuzzyfuzzyfungus · 2010-07-20 11:20 · Score: 2, Informative

Unless things have improved substantially since Dragon NaturallySpeaking 10, I'd be more inclined to describe the performance as "surprisingly adequate job of it, with training, and offers a vaguely cellphone-esque interface for choosing the correct word when it fucks up".

It isn't comedically awful; and it likely beats typing with your stumps, or your eyelids, or whatever; but "pretty good" is being very generous.

(Again, unless things have improved markedly since then) the software works best when used interactively, which allows it to suggest corrections, and you to make them, in real time. It also helps if it has been trained to your voice beforehand. The results of using it non-interactively, on a recording of somebody that it hasn't been trained for, will produce results error-filled enough that you might actually find manual transcription faster than manual editing(or, if you don't mind your family sounding like they've suffered head trauma or exposure to Dadaism, you can just store the recordings, make do with the text, and re-run the process in the future, when the software is better).
Re:Dear aunt, by conchubhair · 2010-07-20 11:24 · Score: 3, Insightful

The problem you are describing (continuous speech recognition) is not solved yet. Even the best state of the art technology is not going to be perfect, and having two speakers will make it even less useful. If you really need the stuff transcribed, you can pay for online services to transcribe it (if they offer really good quality transcription, they are most likely using humans) or you can transcribe it yourself (you can buy software to help speed up the transcription process - including a foot pedal to pause/play the audio, e.g. http://www.nch.com.au/scribe/). My company does a lot of work in speech recognition, and we have tried most of the companies that offer transcription. Some of them even provide APIs so you can code something up. The best fully automatic, commercially available transcription I have seen is from Yap Inc. (http://yapme.com/). If the speaker doesn't have a crazy accent and speaks at a normal level and pace you can get great results, but like all fully automatic transcriptions it can get it wrong. The benefit of Yap is that you can get back the confidence scores and alternates for each word, so if you had a dictionary of your own commonly used words you can pick out a better transcription. You pay by the word for transcription (it is a small amount, but it will add up if you're doing hours of audio). If you're willing to wait, the technology is improving all the time, so you could archive the audio for now and return to have it transcribed in a few years. If you need this done now and want something you can actually read then your cheapest option is to do it yourself, and maybe invest in some software to speed it all up. Unless you have a lot of time on your hand and access to a lot of transcribed audio to build the language models, using any software at home is not worth your while.
Re:Dear aunt, by theheadlessrabbit · 2010-07-20 12:00 · Score: 2, Insightful

let's set so double the killer delete select all.
Seriously, transcribe it manually... automatic speech recognition just doesn't work. And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.
...a computer program cannot do yet

--
-I only code in BASIC.-
Re:Dear aunt, by painandgreed · 2010-07-20 12:04 · Score: 4, Informative

Seriously, transcribe it manually... automatic speech recognition just doesn't work. And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.
Funny, considering my job is training doctors to use voice recognition to do all their reporting. Actually, it works fairly well. I also don't mean dictating something that goes to transcriptionists. The doctors dictate the report. The dictation is transcribed into text. They review it and sign off. We got rid of all our transcriptionists years ago. The time for a report to get done went from 24 hours with transcriptionists to 24 minutes with voice recognition. The amount of errors was cut in half. The doctor's work load was also lessened as they could check the final version while still dealing with the data rather than having to go back and review everything all over again a day or two later. Speech recognition was a problem seven years ago, but hardly at all in the last five or so. Yes, the have to go over their dictations and occasionally make some minor corrections. There's always background noise to worry about and some people's accents are hard even for another person to get through, but for things that require quick turn around and need to be verified by the person who is doing it, voice recognition already is the gold standard.
PS several of the doctors like it so well they bought Dragon (pretty much everybody but Phillips use Dragon for their speech engine) for home and use it there for all their email and other writing.
Re:Dear aunt, by Bluesman · 2010-07-20 12:30 · Score: 2, Informative

by making informed guesses based on context, which a computer program cannot do.
The Perl interpreter can.

--
If moderation could change anything, it would be illegal.
Re:Dear aunt, by Flyerman · 2010-07-20 13:05 · Score: 2, Informative

The parent's link is exactly what I set up on a client's machine. They purchased the headset and pedals but the software itself was free and worked wonderfully.
Re:Dear aunt, by BitZtream · 2010-07-20 14:16 · Score: 4, Interesting

Ironically, I have a family member he runs a business doing transcription for doctors ... because every time the try voice recognition software they get pissed off and go back to real people.
Being a fan of Dragon Dictate myself, I know its not that great and I know it has a fit when you start throwing accents at it, training or not.
I call bullshit on your claims of using Dragon for everything.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Re:Dear aunt, by binarybum · 2010-07-20 14:28 · Score: 2, Interesting

wow, shame on the anonymous troll that posted this and the moderators that must have been teleported from the early 90s. The high-end transcription packages are truly incredible. Yes, you need to spend some time training them to your speech patterns and accent, and yes it makes a big difference if you use a quality microphone (not the one that's built into your laptop or iphone) at a fixed distance. With a decent setup transcription software can be really impressive at high speeds and with complicated vocabulary - talk to a doctor in a large modern hospital - many are trusting these systems with their patients medical record information, and these guys have high expectations when it comes to transcriptions because they are used to having very skilled ears listen to them mumbling jargon quickly for their transcriptions.
Having anything but a really good setup can be really frustrating though - maybe slashdot tinkerers have dabbeled and written these kinds of apps off. I do imagine that it wouldn't be worthwhile using anything but the top dictation apps if you want to avoid any serious post-editing.

--
ôó
Re:Dear aunt, by Mr.+Pibb · 2010-07-20 15:43 · Score: 2, Interesting

I call bullshit on your bullshit.
I do occasional work for a Worker's Comp doc who has been working with Dragon for over 10 years. He swears by it.
The work is an hour-long interview, and hours of paperwork. He dictates the report into a MiniDisc recorder while reviewing his notes and then plays the recording back into the computer, watching for errors (few) and reviewing. I've also set up several other docs in the same field with Dragon, and they're quite pleased with it as well.
At first, he had to buy the latest HW and audio cards to get the best accuracy, but now runs Dragon virtualized on a 1st-gen MacBook without a problem. Dragon FTW!
Re:Dear aunt, by micheas · 2010-07-20 19:24 · Score: 2, Interesting

I can see medical transcriptions being the best point of transcription software.
The vocabulary is largely devoid of slang.
You have long specialized lexicons that are similar to very few other words.
The vocabulary is probably fairly small as most doctors have a fairly specialized practice, so internists don't deal with the same areas as podiatrists, reducing the words that are used.
The repetition is probably fairly high, allowing for training to be more effective than speech on random topics.
In conclusion, for what the original poster wants, voice recognition software is probably not viable, but if you have a medical practice, and are not a general practitioner, you may well find that voice recognition software is usable.

--
Work bio at MMWD
Re:Dear aunt, by msclrhd · 2010-07-20 20:02 · Score: 2, Interesting

Your post highlights a key difference between written and spoken words -- we tend to contract words ("have a" to "hav.uh") and will flow one word into another ("said John" the d at the end of said and the d in the dZ sound merge, so the d at the end of said is dropped -- "sE dZ0n").
Some people drop certain letters at the beginning and end of words -- "'e said 'what 'ave you been doin' today?'". This also makes it more complicated to transcribe. Not to mention regional dialect variations and strong accents.
Then you have words like "four candles" "fork 'andles", "night train" "night rain" (http://en.wikipedia.org/wiki/Homophones) -- a lot of The Two Ronnies humour stemmed from word play that take advantage of the difference between written and spoken speech and how the audience interprets them (see the Hieroglyph sketch for another classic example). 'Ello 'Ello did a similar thing as well.
Re:Dear aunt, by Captain+Damnit · 2010-07-21 06:24 · Score: 2, Informative

13 years ago, when I entered the medical transcription industry, the fellow who sold us our dictation system told me that he was a dead man walking: voice recognition was going to KILL the transcription industry, and he almost felt guilty selling us the system. When we mentioned we had looked at Dragon, he practically cried. 13 years later, that salesman is now deceased, and the transcription industry is larger than ever. Voice recognition in transcription is like Linux on the desktop: every year, articles pop up saying that THIS year will be the year medical transcription dies at the merciless hands of voice recognition.
For a guy in an industry that Netcraft has confirmed is deader than FreeBSD, I'm doing pretty well.
I now own a medical services company that does transcription, so my opinion is certainly biased here, but I fail to see the economic logic in turning a physician, who makes between $120-250K per year, into a clerical worker editing his own files. Especially when said clerical worker can be seated in India. Time is money, and the time of physicians and surgeons is one of the most expensive line items on your medical bill. Even with transcription prices as they are today, tacking 20 minutes of extra editing time onto a doctor's already long work day means that I can do it cheaper with manual labor. Voice recognition just means that I need one MT and a voice recognizer instead of one transcriptionist and a QA person.
Internally, we use a batch speech recognizer based on Sphinx, as the Dragon source is too expensive to license in the volumes that we do. As one of the earlier posters said, the code is the easy part...it's generating the speech corpus that's the really expensive part. Developing that was easily a seven-figure outlay in labor, which is why you don't see any usable free medical speech corpi available for free* on the Internet. You'd think with all the federal money being thrown at making medical records electronic that they could spare a few million to develop an open-source speech corpus, but that would make too much sense.
As long as physicians and surgeons are better paid than the rest of us, someone will be doing transcription.
--
* If you know of one, post a link...believe me, we've looked.

Sphinx by SaXisT4LiF · 2010-07-20 10:54 · Score: 5, Informative

Carnegie Mellon has an open source speech recognition project you might want to look into. Sphinx

--
Fight or flight its all the same
Live to die another day

--Ryan

Unfortunately... by dmneoblade · 2010-07-20 10:55 · Score: 3, Interesting

I spent several month searching for something like this. Open-source voice recognition is in really infant stages, and there does not seem to be much interested in improving the few things we have.

--
Warning, knife is sharp. Please keep out of children.

I've wondered about this too by itamblyn · 2010-07-20 10:55 · Score: 2, Interesting

It seems like there should be some way to "hack" the audio transcription that google offers through google voice or youtube. Unfortunately I haven't found a way to upload a file. With youtube, if you make a fake movie, it gives an error that it can't be transcribed. Getting google voice to work would require some sort of phone interface I suppose...

Re:I've wondered about this too by Enuratique · 2010-07-20 11:16 · Score: 4, Informative

Google relies on Twilio for their audio transcription.

--
A black hole is where God divided by 0

Best Idea by Anonymous Coward · 2010-07-20 10:55 · Score: 3, Informative

just upload it to youtube, its genius google transcription technology will make everything sense out of it.

XTrans by ceraphis · 2010-07-20 11:01 · Score: 2, Interesting

Why don't you give XTrans a shot: XTrans

Re:XTrans by Kev+Vance · 2010-07-20 12:49 · Score: 3, Informative

Ubuntu uses PulseAudio on the ALSA audio subsystem, but that error message indicates XTrans is trying to use the OSS audio subsystem instead. To work around this, try using the Pulse OSS wrapper or temporarily disable Pulse. From the commandline, "padsp xtrans" or "pasuspender xtrans".

--
F0 07 C7 C8

But Windows Speech Recognition... by Monkeedude1212 · 2010-07-20 11:11 · Score: 4, Informative

Most Windows Vista or Win7 machines come with a built in transcribing feature, that you can enable in the control panel (Win7, under ease of access, Speech recognition).

However - the only way it works properly is if you train it to understand you personally. You load your profile, and it'll run you through a whole bunch of test sentences. The FULL test takes you about 20 minutes I think (It's been a while since I've used it) - and actually works quite well. There is a cut off point at about 2 and a half minutes if you want to stop and try it out. It actually makes it keyboard and mouseless if you want. When you open a browser it highlights everything on the web page thats clickable and assigns it a number, and you simply say "Click 7" and it hits the reply button for you. Then you talk when the textbox has focus and it'll transcribe every word you say.

I did this for my girlfriend's paper once, I read it aloud (you have to mention things like comma, end paragraph, etc) and put it into a Word document. Out of a 15 page single spaced Essay - it got 3 sentences wrong - and that's only because I was mentioning some of the more Obscure greek names (she's a history major). It managed to get full sentences regarding Octavia and her fondness of libraries without error, which I thought was odd since thats not a name you hear every day.

Anyways - if he wants to do this, he should record the test phrases (there will be a lot though) and have each of his interviewees read the test sentences so he can then relay those through the computer and train the computer for each person.

All in all - he may still run across a few errors, but its not nearly as bad as say Google Voice Mail, which tries to figure out what you're saying without having any previous knowledge on how that person speaks. Windows Speech Recognition is something that will handle what he's after though.

USB foot control by wguy00 · 2010-07-20 11:22 · Score: 2, Informative

Buy a USB foot control (check out infinity or fortherecord), and download the free player from fortherecord.com. You can stop, start, rewind and fast-forward without having to take your eyes off the screen or leave your word processing app.

I looked, but still do it manually by ciaran_o_riordan · 2010-07-20 11:25 · Score: 4, Informative

I've worked on loooads of transcripts. I did most of these:

* http://wiki.fsfe.org/Transcripts

The best technique I've found is to have mplayer play the audio at 60% normal speed and have a text editor (emacs is my preference) in another window, flick between them with alt-TAB and hit Space to start and pause mplayer.

--
Expert in software patents or patent law? Contribute to the ESP wiki!

Re:I looked, but still do it manually by mutube · 2010-07-20 13:40 · Score: 2, Informative

I'd agree. I did some part-time work transcribing audio a while back for extra pennies. One thing I would add is that instead of using Alt-Tab to switch applications and then hitting space to start/stop I found it was less frustrating to set up global keys for the purpose (I was using KDE at the time, I expect most desktops offer this).
I assigned F12 to skip back 5 seconds and F9 to pause/restart. Using those (esp F12) it was relatively easy to keep up to speed with what was being said without switching away from the editor.

--
Python coder | PyQt Applications | Writer
Re:I looked, but still do it manually by nbauman · 2010-07-20 14:29 · Score: 2, Informative

I've done loads of transcripts too.
The best software I found was the Olympus DSS Player 2002, which came bundled with the expensive Olympus digital recorder (but the cheap ones had a bare-bones software). It was like the old mechanical tape transcribing machines, except much better, with adjustable back pedal, 50% slow speed, 200% fast speed, fast forward, fast back, etc. Newest version is probably better.
Problem was it was optimized for the Olympus proprietary *.DSS format, although you could use *.WAV with some limitations on features.
Sony and the other digital recorders also had playback software; I haven't checked them out but they're probably equivalent.
NCH Scribe (free) could have been a clone of the Olympus player; NCH didn't work as well when I tried it, although later versions may work better.
These programs can be overkill, mplayer at 60% speed sounds like it would work well.

If you don't find anything... by afabbro · 2010-07-20 11:31 · Score: 2, Interesting

...you could always use RentACoder (er, Vworker.com now) and hire someone for pennies to do it.

--
Advice: on VPS providers

the command line by ciaran_o_riordan · 2010-07-20 11:37 · Score: 4, Informative

To play an audio file at 60% normal speed:

mplayer -af scaletempo=scale=0.6 the_file.ogg

And then to check the transcript, change the 0.6 to 1.5 (or 2.0 for someone like Richard Stallman who speaks slowly and clearly).

--
Expert in software patents or patent law? Contribute to the ESP wiki!

Got kids? by Kral_Blbec · 2010-07-20 11:52 · Score: 4, Insightful

Pay them a buck per page and they learn some family history along the way. Problem solved.

Re:Got kids? by Luckyo · 2010-07-20 12:27 · Score: 4, Interesting

This is one of the cases where journey matters as much if not more then destination :)
Re:Got kids? by tehcyder · 2010-07-21 02:15 · Score: 2, Funny

Pay them a buck per page and they learn some family history along the way. Problem solved.

Mummy, why does aunt Bess call grandma a "syphilitic rum-and-cock-addled whore"?
Daddy, why was great grandpa Ben "shot at dawn for cowardice in the face of the enemy"?
Mummy and daddy, how come I was born only four months after you married?

--
To have a right to do a thing is not at all the same as to be right in doing it

High quality recordings now, transcription later by itamblyn · 2010-07-20 11:55 · Score: 2, Informative

I think the most important thing to keep in mind for a project like this is that you should do everything you can to ensure a high quality recording. Don't worry about transcription at this point - just focus on getting content. When algorithms (and computers) have improved in 5-10 years time you can do the transcription. It might even be useful to record the sessions with a video camera. Maybe speech recognition tech of the future will use lipreading in addition to the approaches that are used now.

Foot Pedal and Express Scribe best option by Adattisi · 2010-07-20 12:02 · Score: 4, Informative

I've been a transcriptionist for over 5 years, and unless you want to have to retype most of it yourself anyway, don't offer pennies on a site like guru/vworker/elance. A decent transcriptionist is going to charge at least $45-50 per AUDIO hour (not hours it takes) if it's a good, clear recording & a single speaker. If there was a really great product out there, I'd be out of a job. If you want to do it on the cheap, get an inexpensive USB Infinity foot pedal (on ebay) as mentioned before & Express Scribe is a free download to playback & rewind the audio. Both are what I use. Good luck!

Re:On the other hand... by vrmlguy · 2010-07-20 12:02 · Score: 2, Interesting

I just slice everything up into segments of 60 seconds and let Google Voice transcribe it for me. Sure, some nay-sayers might point out that it's slower that transcribing it all manually, but they don't get that I'm getting Google to do the work for me!

--
Nothing for 6-digit uids?

Coding Horror article by lulalala · 2010-07-20 12:27 · Score: 2, Informative

Coding Horror recently posted an article about the current voice recognition technology.

http://www.codinghorror.com/blog/2010/06/whatever-happened-to-voice-recognition.html

There is a poem which got transcribed, and the title became like this:

"a poem by Mike Bliss --> a poem by like myth"

The rest of the poem is equally funny. So basically you better transcribe it manually.

State of Speech Reco by poor_boi · 2010-07-20 13:25 · Score: 3, Informative

It's been my job to work with speech recognition technology for the last 10 years. I've worked with speaker-independent grammar-based recognizers like Nuance Recognizer. I've worked with speaker-dependent training-based recognizers like Dragon Naturally Speaking. I've used open source recognizers like Sphinx. I've even dabbled with writing my own basic recognition engine. I can tell you with confidence: with the current state of commercial/open-source technology, you will not be able to get satisfactory results transcribing two speakers in the same recording. Accurate machine transcription requires training and single-speaker. I have heard people claim that speech recognition is a dead technology because it has stopped improving at appreciable speeds. While improvements have slowed down drastically, I do not believe speech recognition is dead by any means. We've really been making the same steady progress since the inception of speech recognition -- but previously we were riding the wave of geometric (sometimes exponential) growth in CPU clock rate. Now that the free lunch is gone, recognition algorithms need to be parallelized to once again ride improvements in CPU design.

Doing it yourself... by Cruciform · 2010-07-20 14:34 · Score: 2, Interesting

When I did some medical transcription a couple of years ago it was up to me to do it myself, and I didn't find anything open source at the time.
So I loaded up Amarok, configured global hotkeys to pause and jump forward and backward in the audio file in five second gap, and then loaded up a word processor.
Sure, it's not automatic, but it helped me get the job done.

It took me 3 to 4 hours to transcribe each spoken hour of a group of strangers. When the subjects have familiar speech patterns or it's an individual I found progress was much faster.

Re:On the other hand... by Verteiron · 2010-07-20 15:38 · Score: 4, Funny

I'm sure you roar get ding fan plastic results from goo gull boys, two eye find it variably hell full.

--
End of lesson. You may press the button.

Transana by paugq · 2010-07-20 20:47 · Score: 2, Interesting

It's not what you are asking for, but it sure will help you: Transana

vi by xmorg · 2010-07-20 21:25 · Score: 2, Funny

open up vi, press i, (or a), and press play on the audio device.
Type out whatever you hear.

Problem solved. :wq

Trick: re-speak it yourself by oergiR · 2010-07-21 00:57 · Score: 2, Interesting

I'm doing my PhD on speech recognition. I think (and hope!) it's neither dead nor fully developed. Currently, changes of environment screw speech recognisers up. Different speakers, background noise... A trick that I heard has been used for subtitling television broadcasts is to have someone re-speak the words (which is not that hard). You could play the audio recordings on your headphones while repeating them into a microphone. If you're in a quiet room and the recogniser is trained on your voice, that may get you most of the way. You'll still want to correct transcriptions manually.

I don't know of any good trained open-source speech recognisers. There are open-source back-ends like Sphinx or HTK (which I sort of work on) but you need massive transcribed training corpora to train a speech recogniser. This is expensive which I guess is why open-source speech recognition hasn't taken off. In the speech recognition group at my university, most people use Linux, and I don't think anyone actually uses a speech recogniser in their daily work.

Slashdot Mirror

Open Source Transcription Software?

51 of 221 comments (clear)