YouTube Makes Captioning Available To All

As long as they don't use GVoice Tech. by bennomatic · 2010-03-05 17:57 · Score: 4, Insightful

Or you'll end up with captions like this:

Hey glum, Jen tonight. It's apologize for it, interrupting our conversation in early as this afternoon, yes, so I wanted to returning your call and you know check in with you further. Alright, hope you, I hope you're doing well done. Sounded like you, works but alright. Well I'll call me later. I'll talk to you soon. Bye.

--
The CB App. What's your 20?

Re:As long as they don't use GVoice Tech. by rmadhuram · 2010-03-05 18:10 · Score: 1

Some of my funny ones here:

"Hey bottoms ASAP. But on the religious anyways, call me back and I'm at my just call me back. Thank you."

"Hey what's going on man, this is in 2 mother and anything any cool commands. We have some. Please let me stop you have a cable is nothing important. Bye. "

"Hey Todd, on a bit. The Negro, then I put in an active on plan and payPal okay called up and have a couple (630) 440-6809. Okay bye. "
Re:As long as they don't use GVoice Tech. by The+MAZZTer · 2010-03-05 18:12 · Score: 2, Informative

Phone audio quality is generally much poorer than online videos, in my experience.
Re:As long as they don't use GVoice Tech. by Anonymous Coward · 2010-03-05 18:15 · Score: 0

I hope your friend isn't charged for incoming calls/texts or auto-rickrolls: (630) 206-1300
Re:As long as they don't use GVoice Tech. by bennomatic · 2010-03-05 18:18 · Score: 2, Insightful

True. I think that Google should put an app on their Android phones that recognizes when someone is connected to a GVoice vmail box, and does the recording and processing locally. I figure that'd make a much more accurate translator.

--
The CB App. What's your 20?
Re:As long as they don't use GVoice Tech. by Mr2001 · 2010-03-05 18:21 · Score: 3, Funny

My funniest one:
"Hello voice subscriber what. Hey if you few questions for you. They can feel me 6 like a year like 2 years ago to like forever. Go you came over and I was locked out of the password didnt know the password so much and we wanted. Anybody passed it. I don't know how you guys have a good i just took it out for the first time in years and it says your class is expired. I must be changed and I go to that the windows X P professional you went and dollar dishing whatever it is really old addition, windows 85,001 yet and it's give me a change. Faster screen and says, administrative, which is still around. Funny has got hold us for new password. I confirm you got through. I've any idea what the password again, 30, or if you're more than the who knows no idea what it would've been so if you tell me but sister for you know the next week, otherwise, I was gonna go out to confirm for some a long time, so if you should come pick the and a case."

--
Visual IRC: Fast. Powerful. Free.
Re:As long as they don't use GVoice Tech. by TheJokeExplainer · 2010-03-05 18:44 · Score: 5, Informative

Parent is referring to Google Voice's less-than-perfect voicemail transcription technology which often leads to odd or hilarious transcriptions.

--
visit my pal the xkcd explainer!
Re:As long as they don't use GVoice Tech. by trapnest · 2010-03-05 18:48 · Score: 1

Let's see what this causes... 828 565 1337
Re:As long as they don't use GVoice Tech. by uncqual · 2010-03-05 18:56 · Score: 2, Interesting

My most intersting one:

Hey Hello hello, hi bye hello hello. Bye bye hey hello, test, Hello bye hello. Bye hi hello. Bye, hello hey hey hello hello hello. Bye bye hello. Call hey bye hello hello hello hello hello, hey bye bye bye hello. Bye hello. Bye hello hello. Bye. Hello S hello. Bye bye. Hello. Hello. Yeah, hello. Bye hello hello hello hello, hey, hey, yeah.

Some of the words hello and bye were dark, the rest were mostly light gray.
What, one may wonder, was the actual message? Well, it appeared to be someone trying to fax something - although, the tones didn't sound quite like FAX negotiation tones, but surely no one would be mis-dialing a modem number in this day and age.
I was intrigued by the limited vocabulary it produced here. Almost as if the most common words are these greeting words (hello, hey, hi) and sign off words (bye) and these words are so preferred that line noise ends up just being these top few words.

--
Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading /.
Re:As long as they don't use GVoice Tech. by Joe+Tie. · 2010-03-05 19:16 · Score: 1

I wonder what accounts for the difference. I'd say in general most people who call me come out 99% perfect on the transcripts. Except one friend, with a Texan accent, who usually is closer to 50% accurate.

--
Everything will be taken away from you.
Re:As long as they don't use GVoice Tech. by zill · 2010-03-05 19:42 · Score: 1

I have severe developmental speech disorder, you insensitive clod!

I'm never inviting you to my parties again.
Re:As long as they don't use GVoice Tech. by aCC · 2010-03-05 21:16 · Score: 1

How about you leave a voice message just reading that text? What's the result? Maybe it's some kind of "encryption" like ROT-13 but for voice messages. ;-)
Re:As long as they don't use GVoice Tech. by Idiomatick · 2010-03-05 21:17 · Score: 1

Pretty brilliant idea. It'd be a bit annoying to implement and could only work on data plan phones android - android. But it seems feasible. Wonder how much of an improvement that would reap.
Re:As long as they don't use GVoice Tech. by Anonymous Coward · 2010-03-05 21:50 · Score: 1, Interesting

Isn’t that essentially what modem negotiation actually is? The two modems talking to each other, saying “hello” at length?
My goodness. It’s alive, and it can understand V.34...
Re:As long as they don't use GVoice Tech. by Anonymous Coward · 2010-03-05 22:42 · Score: 0

Charged for incoming calls/texts? That's the weirdest thing I have ever heard.
Re:As long as they don't use GVoice Tech. by Anonymous Coward · 2010-03-05 22:46 · Score: 0

You'd be surprised in the land of the free...
Re:As long as they don't use GVoice Tech. by jargon82 · 2010-03-05 23:07 · Score: 3, Funny

I'll never forget the time I was playing with dragon (the speech recognition software), and it seemed to pick up an obsession for the word "orange"... Mall was orange. Bus was orange. Elephant was Eggplant, but that's a pointless tale for another time...
Meanwhile, speech recognition still fails, and google voice is just the worlds best demonstration of why :)
Re:As long as they don't use GVoice Tech. by dominious · 2010-03-06 00:31 · Score: 1

It would really be funny if the developers planted a message when listening to standard fax negotiation tones:

Hey, how are you?
Not much going on. This new "Exchange Server" is such an asshole I wish he dies!
Yeah I know what you're sayin.. I think they're gonna throw me away soon:(
Oh well...here's the fax anyway. Hope to hear from you soon..Bye!
bip-bip bip bip bip bip-bip....
bib bip bip-bip...
Re:As long as they don't use GVoice Tech. by John+Hasler · 2010-03-06 01:32 · Score: 2, Funny

I know people for whom the examples in this thread would be accurate transcriptions...

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Re:As long as they don't use GVoice Tech. by John+Hasler · 2010-03-06 02:10 · Score: 1

> I wonder what accounts for the difference.
Some people sound that way on my answering machine (and others come across that way in person).

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Re:As long as they don't use GVoice Tech. by SpinyNorman · 2010-03-06 03:20 · Score: 1

I doubt it'd make any difference.
Speech recognition technology is really still in its infancy... it's possible to get good results but only under the most controlled of circumstances... high quality microphone, no background noise, clear diction, recognition engine trained for the speaker, etc. Even then it may depend on what you're actually saying, since in the case of any ambiguity a smart recognition engine will fall back to grammatical analysis and word frequency counts etc to try to guess right.
The real problem is that speech recognition requires artificial intelligence to do right and we don't have it. Often we understand speech that word for word is basically unintelligible, but we automatically apply context and intelligence to figure out what the speaker was trying to enunciate. Without full AI, a computer can't do that - it's much more limited.
When the foreigner serving you at McDonalds mumbles something after taking your order, you only need to understand a single word, or maybe not even that, to realize they are saying "to stay or take out?", but a computer today would need the person to have spoken clearly enough to have made out the words. Ditto for a stream-of-consciousness rambling GMail voice message with highway noise in the background - YOU may be able to FIGURE OUT what is being said, but that's not the same as the words actually being intelligible which is what a computer, without AI, would need to be able to transcribe it accurately.
Having a clear speech signal in the first place, or having a broadband vs telephone limited one, isn't going to make much difference, except under otherwise very controlled conditions.
Re:As long as they don't use GVoice Tech. by crossmr · 2010-03-06 03:48 · Score: 1

it doesn't matter. I just checked out a couple of high quality videos with a normal person speaking english without background noise..it was a jumbled mess of garbage. Another fine google production.
Re:As long as they don't use GVoice Tech. by bertoelcon · 2010-03-06 04:00 · Score: 1

Except one friend, with a Texan accent, who usually is closer to 50% accurate.
Of course if you live in Texas and get called by mostly people with Texan accents you get 50% accuracy.

--
Anything can be found funny, from a certain point of view.
Re:As long as they don't use GVoice Tech. by assassinator42 · 2010-03-06 05:07 · Score: 1

I wouldn't say the transcripts have been 99% accurate word for word for me, but I can almost always get the meaning. The one exception being a friend with a speech impediment.
The YouTube transcripts are pretty much useless from what I can tell.
Re:As long as they don't use GVoice Tech. by Anonymous Coward · 2010-03-06 06:07 · Score: 0

Look at google translate. It's already impressively aware of context - if you translate "I'm going to watch that awful eighties comedy 'Airplane' tonight" into Norwegian, airplane isn't translated literally ("fly"), but correctly into what that movie was called in Norway ("Hjelp, vi flyr"). No one right in their mind would manually enter and try to keep updated lists of movie titles, so this is something the translator has learned on its own. If that's AI, google is damn good at it.
Sometimes, this causes it to make impressive errors (translating "Bodø Lufthavn" to "Santiago airport" for instance), but it only gets better and better.
Re:As long as they don't use GVoice Tech. by rockNme2349 · 2010-03-06 06:07 · Score: 1

Google Voice Voicemail Transcriptions! Now with Mad Gab embedded puzzles!

--
Sewage Treatment Facilities - "Our duty is clear."
Re:As long as they don't use GVoice Tech. by SEWilco · 2010-03-06 06:16 · Score: 1

Now we finally know that mall and bus rhyme with orange.
Re:As long as they don't use GVoice Tech. by SpinyNorman · 2010-03-06 06:38 · Score: 1

Google translate has for the last couple of years been based on what is essentially database lookup rather than traditional grammatical/semantic analysis used by other translators such as Bablefish. When they made the switch the quality noticeably improved.
Basically they've got a huge database of snippets of language and their corresponding translations if different languages that was originally build from hand translated sources such as publically available United Nations documents, etc. When they translate a document for you using this approach they're basically just looking for the longest snippet of source document for which they've got an existing translation, then moving onto the next etc. Obviously there's quite a bit more to it than that, but this gives the gist of it. What's great about this approach is that you get chunks of hand translated document and it can handle idiom.
I just tried the example you gave, and it left the movie title "Airplane" as-is in the Norwegian translation. Maybe you tried it a while back using the old version?
Re:As long as they don't use GVoice Tech. by Toonol · 2010-03-06 07:26 · Score: 1

That's kind of the pkzip/unzip algorithm (or other compression) algorithm, of bundling up the longest/most repeated character streams first... except that the matching is done using an external lookup table, swapping languages when decoding.

Ok, it's not MUCH like that, but it's enough to give me a few ideas. Speech recognition, data compression, and AI have a lot in common; they're all bottlenecked at the same point.

In theory, I wonder if an effective (lossy) text compression could be created that strips individual language syntax out? In effect, it would store 'ideas' (tokens representing the largest linguistic streams representable as a single unit), and a specific language could be selected at decompression time to render into.

The once and future Deaf accessible internet. by flerchin · 2010-03-05 17:57 · Score: 2, Informative

Huzzah! Now if we can just get subtitling/captioning on Netflix streams, the net will be accessible to the Deaf again.

--
--why?

Re:The once and future Deaf accessible internet. by aussie_a · 2010-03-05 18:53 · Score: 3, Insightful

I almost never turn on my speakers and yet I find the internet quite accessible.
I'm not saying this isn't a great development. But to try to portray the internet as inaccessible to the deaf before now is ridiculous.
Re:The once and future Deaf accessible internet. by WeatherGod · 2010-03-06 06:12 · Score: 1

Actually, the internet of the old used to be extremely accessible to deaf and hard-of-hearing people. However, the advent of YouTube, podcasts and other multimedia services has caused an exciting and new part of the internet to be inaccessible to these people. This technology -- if it works -- will help bring the internet back to deaf and hard-of-hearing people.

Not only that by Anonymous Coward · 2010-03-05 17:58 · Score: 1, Interesting

They also changed the way videos are sent to the browser, many flash video players are failing because of that.

Re:Not only that by Anonymous Coward · 2010-03-05 21:43 · Score: 0

it's worth mentioning that captions don't work with html5, so Flash is still the only solution for video if you want that feature.
again, it looks very much like flash is moving ahead all the time, while the alternatives struggle and fail to get off the ground.
Re:Not only that by icebraining · 2010-03-06 01:04 · Score: 1

Wrong.

--
Dilbert RSS feed
Re:Not only that by Anonymous Coward · 2010-03-06 02:29 · Score: 0

sorry.... i don't want to make you look silly but please have a look at...
http://captionaction2.blogspot.com/2009/07/html-5-has-no-captioning-provisions.html
http://billcreswell.wordpress.com/2010/01/24/html5-youtube-and-why-the-emperor-has-no-captions/
my apologies again!
Re:Not only that by icebraining · 2010-03-06 07:38 · Score: 1

From your second link:

There are several examples of how captions *can* be implemented with javascript, but not standard format. http://blog.gingertech.net/2008/12/12/attaching-subtitles-to-html5-video/, and my favorite implementation so far (Firefox 3.1+/ogg) - http://www.mozbox.org/pub/srt/index2.xhtml

So, they *can* be implemented using Javascript - You don't need any kind of plugins for that. And if you had read my link, you would see a link with an example of that, which even provides you to a selection of language that changes in real time.

--
Dilbert RSS feed

Automatically generate the technology? by Mr2001 · 2010-03-05 18:13 · Score: 5, Funny

Talk about advanced! Back in my day, we had to pay engineers to generate technology for us!

--
Visual IRC: Fast. Powerful. Free.

Re:Automatically generate the technology? by nebaz · 2010-03-05 18:22 · Score: 2, Funny

Feeling feeling = Feeling.getFeeling(Feeling.LAUGHTER);
feeling.express();

--
Rhymes that keep their secrets will unfold behind the clouds.There upon the rainbow is the answer to a neverending story
Re:Automatically generate the technology? by MichaelSmith · 2010-03-05 18:28 · Score: 1

I can sell you a UML modeller which will do that. Just $100k per license. Believe me its cheap at the price. Let me demonstrate how you refactor the code. Just drag this little icon from here to here and the other little icons reorganise themselves around it. Buy this and you will never have to hire an engineer again!

--
http://michaelsmith.id.au
Re:Automatically generate the technology? by AndrewBC · 2010-03-05 19:12 · Score: 1

Sounds like you're suffering from stuttering semantics -- Either that or you're an egregiously emotional eccentric.
Re:Automatically generate the technology? by Anonymous Coward · 2010-03-05 19:30 · Score: 0

Rofl
Re:Automatically generate the technology? by Anonymous Coward · 2010-03-05 20:55 · Score: 0

He suffers from OOP, you insensitive clod!
Re:Automatically generate the technology? by Hurricane78 · 2010-03-05 22:02 · Score: 1

I pay technology to generate engineers, you insensitive clod!

--
Any sufficiently advanced intelligence is indistinguishable from stupidity.
Re:Automatically generate the technology? by Hurricane78 · 2010-03-05 22:37 · Score: 2, Funny

No! I’m not from Soviet Russia!

--
Any sufficiently advanced intelligence is indistinguishable from stupidity.
Re:Automatically generate the technology? by Thantik · 2010-03-06 06:44 · Score: 1

And here I thought sprinkling 'self.' throughout my Python classes made me egotistical...

Technology, technology, baked beans and technology by Anonymous Coward · 2010-03-05 18:13 · Score: 0

Have they got anything without technology in it?

Noteable, but still very much experimental by Coopjust · 2010-03-05 18:14 · Score: 3, Informative

The results are still very funny, especially for non-English speakers.

However, it's a technology that is still relatively young. One hopes that applying it to Youtube will help Google improve the accuracy.

However, except for spoken videos with a native English speaker with absolutely no background noise, it's nothing more than a novelty at this point. Trying this on several videos not only yielded hilarious results, but delays of several seconds in some cases.

Re:Noteable, but still very much experimental by Idiomatick · 2010-03-05 21:23 · Score: 4, Interesting

"One hopes that applying it to Youtube will help Google improve the accuracy."

This, if they allow for corrections it could be an incredibly huge resource of data for google. They'd end up with people spending millions of man hours teaching google how to do voice recognition. And having highly accurate voice recognition would be a boon for society generally.
Re:Noteable, but still very much experimental by Anonymous Coward · 2010-03-05 23:59 · Score: 0

With None-English you mean English?
http://www.youtube.com/watch?v=EzV3wIrFa3U
turning it on on Rocketboom video's it messes up (hard). however people talking Slack (USA) English will render correctly.
Re:Noteable, but still very much experimental by oztiks · 2010-03-06 02:19 · Score: 1

Im the first to agree but then i saw Microsoft's attempt at voice recognition and its just as poor.
There needs to be significant improvements as whole until this stuff works properly, sadly i think it's still got a long way to go.
Accents play a big part, also the rate at people speak join words, you can tell youtube's voice recognition is good, but it doesn't keep up in those areas at all.
Re:Noteable, but still very much experimental by crossmr · 2010-03-06 03:52 · Score: 3, Insightful

and then some company will come along and sue them for not being competitive because they have access to all this great data to make fantastic products other companies can't make.
Re:Noteable, but still very much experimental by Coopjust · 2010-03-06 03:54 · Score: 1

And that's the interesting part. Some people provide their own captions, that's effectively training for the voice recognition algorithm.
Re:Noteable, but still very much experimental by Djupblue · 2010-03-06 06:51 · Score: 1

Poor, poor Microsoft crying and complaining when they get punished for breaking the law.

Interactive Transcripts vs. Captions by syke1911 · 2010-03-05 18:20 · Score: 2, Insightful

I'm trying to understand the difference between an interactive transcript, as seen at protranscript.com, and a caption. Why did Google go the embedded captioning route? Isn't the goal to create searchable content? If so, captions don't seem to be the solution.

Re:Interactive Transcripts vs. Captions by Anonymous Coward · 2010-03-05 18:36 · Score: 0

I'm trying to understand the difference between an interactive transcript, as seen at protranscript.com, and a caption. Why did Google go the embedded captioning route? Isn't the goal to create searchable content? If so, captions don't seem to be the solution.
I'm not sure myself.. my organization spent a significant amount of money converting our captioned videos to interactive transcripts as management believed they were the 'best new thing'. Then again, Google tends to know what they are doing.
Re: Interactive Transcripts vs. Captions by Alwin+Henseler · 2010-03-05 19:19 · Score: 1

I can imagine Google would cache intermediate results, possibly improve those results from time to time, and create a good coupling to its own search engine. Other search engines might have to 'distill' searchable text from the video (=difficult?), so that Google can search YouTube video content better than other search engines? Just a guess, FWIW.
Re:Interactive Transcripts vs. Captions by phantomfive · 2010-03-05 22:15 · Score: 1

Google has no problem searching it, they have the data. The problem will be for other bots searching youtube, and I can imagine reasons why Google would not want to make it easy for others to search their site.

--
Qxe4

CC this... by flogger · 2010-03-05 18:21 · Score: 5, Funny

I looked but I can;t find google's CC button for this video: http://www.youtube.com/watch?v=ZA1NoOOoaNw

--
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
"First things first -- but not necessarily in that order"
-- The Doctor, "Doctor

Re:CC this... by R3coiler · 2010-03-05 19:07 · Score: 1

Or this: http://www.youtube.com/watch?v=jH8gtrD4_C4
Re:CC this... by Anonymous Coward · 2010-03-05 21:11 · Score: 0

Oh! My! Non-existent-deity-of-choice!
Re:CC this... by mdwh2 · 2010-03-06 01:56 · Score: 1

I was disappointed to see they don't have it for this: http://www.youtube.com/watch?v=t6FUR_nhGX8
(Seriously though - after searching through many videos, I've yet to find a single one that does have the option, other than one that someone posted above. "Most, if not all"? "All" is clearly not true, and it's hard to see justification for the "most", unless I'm being very unlucky in my search...)

Search? by Spy+Hunter · 2010-03-05 18:22 · Score: 2, Insightful

I haven't seen any mention of search, which seems odd. Google is adding captions to every YouTube video, and nobody is interested in whether you'll be able to search the captions or not? Seems to me like it could be quite useful to search the captions of every video on YouTube.

--
main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}

Re:Search? by lobsterturd · 2010-03-05 18:26 · Score: 1

YouTube captions have been searchable since shortly after they were introduced.
Re:Search? by Spy+Hunter · 2010-03-05 18:33 · Score: 3, Informative

Indeed; here's an example search showing caption results. I'm just surprised that, of the several articles "covering" this story that I've seen, none have mentioned (even in passing) the applicability of universal captioning to search.

--
main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
Re:Search? by WoLpH · 2010-03-05 23:37 · Score: 1

I think "all" is quite an exaggeration too. When looking for all videos with "a" in it (should be a lot) I get 283,000 results, while it normally results in "millions".
The search queries:
http://www.youtube.com/results?search_type=videos&closed_captions=1&uni=3&suggested_categories=10,24,1,15,25,28&search_query=a
http://www.youtube.com/results?search_query=a&search_type=&aq=f

All yore soup tittles Arnie belong two arse. by idji · 2010-03-05 18:25 · Score: 1

Just imagine when they hook this up to Google translation and text2speech. You can choose your language for youtube audio.

Wish commercial TV stations would use this tech! by Alwin+Henseler · 2010-03-05 19:03 · Score: 2, Interesting

Wish this technology would be used by TV stations to provide 'sort of' subtitling for programs that don't have any. This could be helpful for deaf/hearing impaired viewers.

Where I live (Netherlands), there's a few public TV channels. Most programs on there are subtitled using a dedicated teletext page (888). For the bulk of commercial channels, there's also subtitles for things like prime time movies, and specific (popular) TV shows. But a lot of it is not, like average day time shows / late night documentaries / commercials etc. etc. This is due to manpower/cost issues: you have a limited audience, a limited percentage of viewers that is deaf/hearing impaired, and (proper) subtitling needs humans. Read money = eating into commercial TV stations' bottom line. It's entirely up to these stations to decide what to subtitle, and what not.

This technology (combined with automated translation) would be a nice complement for those programmes where human-provided subtitling is deemed to expensive. Automated translation is still bad at times, but for deaf/hearing impaired people, subtitles with a bad translation can still be better than no subtitles at all. An automated system shouldn't be very expensive when applied to mass media like national TV, and would be easy to provide for all programmes. And perhaps speech recognition / automated translation would improve over time, to the point where humans aren't needed anymore to get good results.

"Technology" by Anonymous Coward · 2010-03-05 19:08 · Score: 0

And the overused buzzword of the day is ...

Go to youtube RIGHT NOW for some laughs...for now by mykos · 2010-03-05 19:08 · Score: 1

I'm sure they will improve it dramatically in the coming months and years, but I have not laughed so hard in a while at some of the stuff it comes up with. It's as funny as using a translator to translate a word into Korean and back again.

Re:Go to youtube RIGHT NOW for some laughs...for n by Anonymous Coward · 2010-03-05 19:18 · Score: 0

I agree.

YouTube's CC needs for more work. The English spoken words to English close caption is far off.

http://www.youtube.com/watch?v=VROZ2bbiQLc

The narrator of this video makes no mention of "senate Chris".

Too funny. The translation is totally off.

Dear Aunt, by Anonymous Coward · 2010-03-05 19:29 · Score: 0

let's set so double the killer delete select all.

About as good as I expected by Clovert+Agent · 2010-03-05 20:11 · Score: 1

Which is to say, pretty darned feeble. Clever work, but basically rubbish when compared to user expectation.

One of my favourite videos is this one (http://www.youtube.com/watch?v=yYAw79386WI), dating from the '30s, about how differential gears work. The voice-over is that beautifully clear, precise American newsreader accent of the period, and there isn't any background music to confuse things. If anything should be a perfect candidate for a computer to analyse, it's this.

But the captions are worse than I'd expect from off the shelf software like Dragon Dictate, which isn't particular special itself. A perfectly enunciated "road" with a very clear final D, is misheard as "role", for example. There are mistakes in nearly every line, and while sometimes they're obvious, sometimes they're just bizarre.

I'm tempted to say "nice try, good work for a first shot, and hey, it's a beta so it'll get better." But I've been exposed to software dictation software for over a decade, and it just hasn't, really. So I don't think it will, and I don't think most people will get much use out of it, apart from the odd giggle at the YouTube equivalent of "Dear aunt, let's set so double the killer delete select all..."

What I would be interested in hearing is whether this, flawed as it is, is useable enough for a deaf person. In context, you'd probably figure out that "role"="road", but would you guess that "outmoded"="are mounted"? Maybe, maybe not - watch the video on mute with the captions on, and it's kinda tricky but you can get the gist of it. But then I'm reminded that this is the best case video I could find, and most will probably be worse. It'll be interesting to see what the feedback is from deaf people, and whether it really makes a difference, and whether the context makes up for the poor quality. I'd like to hope it might do just that.

Re:About as good as I expected by gr8dude · 2010-03-05 22:05 · Score: 1

I think the solution is to let people submit corrections for the automatically generated subtitles.
This way we'll get a starting point, so the problem becomes more simple.
I am now trying to write the subtitles for one of my lectures, and I find it very very tiring and difficult. The greatest problem for me is in synchronizing audio with text - I have to manually indicate in which time period a particular text needs to be shown.
In other words, the bottleneck is not in figuring out what the words are, it is in figuring out how to sync them. Most of the time is invested into shifting ranges and offsetting the subs by a few ms, until I get that right. The most difficult part is in synchronizing the pieces between them - if I shift the interval for one piece of text - it can overlap with adjacent pieces, and they need to be reviewed as well.
If a computer could do that for me - I'd be happy.

--
The saddest poem
Re:About as good as I expected by Anarki2004 · 2010-03-06 05:02 · Score: 1

letting people submit corrections will work great until /b/ discovers it. Then every other caption will be "jews did 911" and "never gonna give you up". Remember Bucket the chatbot?

--
The teachers will crack any minute, purple monkey dishwasher.

Whatever happened to by Anonymous Coward · 2010-03-05 20:23 · Score: 0

Whatever happened to the Berger Liaw speech recognition system? One article (from 11 years ago) is here. It had the ability to track multiple (dozens) of voices simultaneously, could process speech spoken in a continuous stream, and could detect speech in very high noise environments (in some tests, human listeners could only tell what words were being spoken with 50% accuracy, and the voice recognition system could still tell what was being spoken 85% of the time ---very high noise environments like someone speaking in normal room conversation voice, beside a jet engine. The US Navy (submarine service) was a strong advocate of the technology, but I've heard very little about it since.

"automatically generate the technology" by Daniel+Dvorkin · 2010-03-05 20:53 · Score: 1

http://www.youtube.com/results?search_query=buzzword+bingo&search_type=&aq=2&oq=buzzwor seems appropriate here.

--
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.

Can they combine this with lip reading? by wisebabo · 2010-03-05 21:18 · Score: 1

Could you combine this with the lip reading technology that was introduce to allow "voiceless" cell phone calls? http://www.ubergizmo.com/15/archives/2010/03/lip_reading_technology_unveiled.html Wouldn't that improve the accuracy for those scenes where the speakers mouth is visible?

Or how about using the subtitle tracks that are in a different language and reverse translating them to provide additional clues as to what the speaker might have been saying? It might help a little.

Re:Go to youtube RIGHT NOW for some laughs...for n by Vintermann · 2010-03-05 21:20 · Score: 1

Take a look at this board game review:

http://www.youtube.com/watch?v=Uv6pIFgfa0U

His name (Tom Vasel) appears to be consistently translated "oh come on now". What, don't they believe that's his name? He comes with surprising revelations such as "I'll be your next president" and wonderful nonsense like "but it is a ten-year period deduction gay".

--
xkcd is not in the sudoers file. This incident will be reported.

Which? by WGFCrafty · 2010-03-05 21:29 · Score: 1

Fish sticks or Fish dicks?

Re:Which? by the_lesser_gatsby · 2010-03-07 03:25 · Score: 1

That's simple - just google it and use the phrase which returns the most hits.

Oh goody. by Pyrion · 2010-03-05 21:37 · Score: 1

Just what we all needed: something dumber than user comments to read on YouTube.

--
"There is much pleasure to be gained from useless knowledge." - Bertrand Russell.

Let me guess, Youtube.ru by santax · 2010-03-05 22:18 · Score: 4, Funny

reads the caption and then produces the video?

Re:Let me guess, Youtube.ru by HoppQ · 2010-03-06 06:10 · Score: 1

reads the caption and then produces the video?
Actually, a rather obvious extension to this technology would be to feed the captions to a machine translator and a text-to-speech synthesizer to produce e.g. Russian voice for a video for those Russians who don't comprehend spoken or written English.

--
My sig will be released in 2015 third quarter. Rating pending.

It hates Bono by Anonymous Coward · 2010-03-05 22:32 · Score: 0

I don't think it likes Bono. I was watching a speech by George Bush where he says about Bono: "he is a man of depth and great heart". The caption was: "he is a man of death and great whore"

Really? most? by crossmr · 2010-03-06 01:22 · Score: 1

Most, if not all, YouTube videos now include a 'CC' button that, if pressed, will automatically generate the closed-captioning technology.

The first 10 videos I've been to don't include it. Including suggested and front page vids.

Is this a metric most?

Re:Really? most? by crossmr · 2010-03-06 01:29 · Score: 1

oh wait.. just found one.
What a train wreck. cheers google on yet another amazing product.
Here is what is actually said:
Hey Everyone So a lot of you may know that the Vancouver 2010 Winter Olympics are coming up
and here is the transcribed audio:
Everyone felt like a man of the I think every time he's had a winter olympics are coming
Just fantastic..wow..
This is certainly front page worthy.
I'm going to roll out a different product.
Basically the system will try to guess (not very accurately) how many words are said and then just pick a random word out of the dictionary. I would guess that averaged out it might provide something more readable than this.
This is on par with their "beta" CC translation service which used google's fantastic web translation skills to translate english into horribly butchered and unreadable asian languages (translation into korean is confirmed as a complete waste of time)
Even more fantastic they allow you to then translate these autogenerated pieces of roadkill..wow.. who could this possibly be useful for?

Good timing by RealGrouchy · 2010-03-06 03:04 · Score: 1

This is excellent timing; I clicked on the link to a video on the previous /. story but my sound was not working. I thought, "man, I wish more videos were closed-captioned," not just for lazy people like me but also for the hearing impaired.

Finally it'll be easier for me to share these videos with my deaf and hard-of-hearing friends!

- RG>

--
Hey pal, this isn't a pleasantforest, so don't waste my time with pleasantries!

Hitler Parodies the easy way by BenJeremy · 2010-03-06 03:09 · Score: 1

I like the "CC" feature... it makes it very simple to do those Hitler Downfall parodies... but I was surprised that I was the first to actually make one using the feature. My video features closed captions for both the original German-to-English translation, and a Lost parody script. I also provide a handy download to a text-editable SRT file so others can make their own (does that make me a bad person?).

The nice thing is that you can add as many subtitle files as you like... and give each of them separate titles. It understands language, so presumably, my parody can be run through translator (on the fly) for any other language. Now, one "blank" can provide hundreds of alternate parodies from one YouTube video.

I just wonder if this "automatic" feature will try and create subtitles on my blank, with subtitles already loaded.

Re:Hitler Parodies the easy way by BenJeremy · 2010-03-06 03:12 · Score: 1

On a side note, I see that YouTube has not gotten to any of my videos with this "automagic" speech recognition-generated closed captions. I was hoping they would try and make one for this video of mine, just to see what it generated.

Netflix needs to get away from SilverDimGlow by mrflash818 · 2010-03-06 03:37 · Score: 1

This is why Google rocks ...and M$'s tarnished SilverDimGlow does not.

Srongly wish Netflix would realign themselves to use a youtube-like setup instead, but I strongly suspect M$ either threw them 'an offer they could not refuse', or this will become yet another mutual lock-in, like Intel_M$.

(Really irritated that I cannot, yet, watch Netflix from my Debian machines.)

--
Uh, Linux geek since 1999.

Might mean videos could be searchable by content by mrflash818 · 2010-03-06 03:44 · Score: 1

An interesting upside to all this might be that, if Google keeps the dialog from youtube content in their searchable database, people may soon be able to search for videos via content.

Right now, I believe keywords need to be done, but the auto-captioning would remove that barrier, perhaps.

"Here's looking at you, kid."

--
Uh, Linux geek since 1999.

Re:Wish commercial TV stations would use this tech by crossmr · 2010-03-06 03:57 · Score: 1

Proper subtitling needs humans, but come on, be honest. How much manpower does it actually require to subtitle something?
If its your native language its a matter of timing. Little else. If you're paying someone to be on the clock depending on the length of the program it might take anywhere from 30 minutes to a day for a long program. How much is a day's wages for even the lowest of budget infomercials?

if you're translating, you're probably not translating something new, and that means there are likely already native subs for it. So its simple a matter of translation, not timing. I've seen subbers here in korea fan-sub a 30 minute sitcom fresh off the air, in just a few hours without even public access to english subtitles first.

2 Girls and 1 Cup Reactions by DJRumpy · 2010-03-06 03:58 · Score: 1

Does this mean they can now enjoy the 2 Girls and 1 Cup Reaction videos?

http://www.youtube.com/watch?v=ggaWaK5d23Y

Re:2 Girls and 1 Cup Reactions by Anonymous Coward · 2010-03-06 04:03 · Score: 0

Stewie and Brian - http://www.youtube.com/watch?v=ASHLLZbue44

Is this Gaudi? by snsh · 2010-03-06 04:26 · Score: 1

This is good news. I've been looking at speech-to-text and audiomining for a while. My goal was not captioning, but search, so in a long video or large set of videos, a user can quickly find snippets of video mentioning a word or phrase, and replay the found snippets. I found a bunch of options but budget was always in issue. Google Audio (Gaudi) was free (cool!) but seemed like a dead-end project after the 2008 elections. Blinx- spinoff from BBN focused on media companies. $$$$$$. Autonomy- enterprise search/monitoring company bought tech from Virage. $$$$$$. Virage- sold their tech to autonomy, then redeveloped it. Coveo- audiomining software using Nuance SDK and Silverlight front end. $$$$$ . TVeyes- does a lot of real-time monitoring. $$$$$. Nexidia- audiomining software uses their own phoneme tools. $$$$$$. Is this YouTube service an incarnation of Gaudi? Either way, it's nice that it's finally out there.

Subtitling live TV by tepples · 2010-03-06 05:59 · Score: 1

If you're paying someone to be on the clock depending on the length of the program it might take anywhere from 30 minutes to a day for a long program.

If the captioning takes longer than the program, you have to do it in advance. This rules out captioning news, sports, entertainment awards, and other live programs.

Re:Subtitling live TV by crossmr · 2010-03-06 12:16 · Score: 1

not really. Most lives things are actually shown on a tape delay. CC already exists for the news. but usually live programming is less concerned with exact timing and its often a constant stream of words, like with the news. I'm talking more about subbing a 2 hours movie and spending time making sure the captions line up perfectly with the dialog. It can be a tedious process. With a live program you just need someone who can type fast and accurately with a slight tape delay to check for any crazy mistakes.
Re:Subtitling live TV by AK+Marc · 2010-03-06 16:51 · Score: 1

I've seen lots of live news in the US captioned, so having a human do a "suitable" job can be done in the time of a program. Granted, there was a noticeable error rate. But it was good enough that people would be quite happy with it.

--
Learn to love Alaska

It's a Markov chain by tepples · 2010-03-06 06:03 · Score: 1

If the recognizer isn't sure, perhaps it could use the fact that there are six times as many "fish sticks" as "fish dicks" in Google's web index. I'd bet it already does; there's a reason for the "Markov" in hidden Markov modeling.

Now easier to catch unwanted content by Aoet_325 · 2010-03-06 07:43 · Score: 2, Interesting

Soon (now?) they can generate captions of everything heard (or sung) in a video immediately after upload and match the captions against lyrics and transcriptions of copyrighted works or even just search them for specific keywords. Then they can flag those videos as possible copyright violations or even prevent them from being displayed until after being reviewed by someone.

I'm not saying captioning isn't a good idea, only that it can be used for more than just assisting the hard of hearing.

Re:I don't have the captions by rduke15 · 2010-03-06 11:51 · Score: 1

I tried the video mentioned here, but it just tells me "Captions are not availabel". Strange.

Is it because I'm in Europe?
Because I use Firefox on Linux?

The video mentioned a few posts before that is even weirder: it seems to have captions, I can turn them on, but no captions are displayed.

Slashdot Mirror

YouTube Makes Captioning Available To All

102 comments