YouTube Makes Captioning Available To All
adeelarshad82 writes "Google's YouTube announced that it has moved its automatic speech-recognition and closed-captioning technology out of beta and has now made it available to the YouTube community at large. Most, if not all, YouTube videos now include a 'CC' button that, if pressed, will automatically generate the closed-captioning technology. The technology processes the audio feed using the speech-recognition technology used in the core voice search feature that has also been built into the Android voice search feature, the GOOG-411 phone search, and other products."
Hey glum, Jen tonight. It's apologize for it, interrupting our conversation in early as this afternoon, yes, so I wanted to returning your call and you know check in with you further. Alright, hope you, I hope you're doing well done. Sounded like you, works but alright. Well I'll call me later. I'll talk to you soon. Bye.
The CB App. What's your 20?
Huzzah! Now if we can just get subtitling/captioning on Netflix streams, the net will be accessible to the Deaf again.
--why?
They also changed the way videos are sent to the browser, many flash video players are failing because of that.
Talk about advanced! Back in my day, we had to pay engineers to generate technology for us!
Visual IRC: Fast. Powerful. Free.
Have they got anything without technology in it?
The results are still very funny, especially for non-English speakers.
However, it's a technology that is still relatively young. One hopes that applying it to Youtube will help Google improve the accuracy.
However, except for spoken videos with a native English speaker with absolutely no background noise, it's nothing more than a novelty at this point. Trying this on several videos not only yielded hilarious results, but delays of several seconds in some cases.
I'm trying to understand the difference between an interactive transcript, as seen at protranscript.com, and a caption. Why did Google go the embedded captioning route? Isn't the goal to create searchable content? If so, captions don't seem to be the solution.
I looked but I can;t find google's CC button for this video: http://www.youtube.com/watch?v=ZA1NoOOoaNw
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
"First things first -- but not necessarily in that order"
-- The Doctor, "Doctor
I haven't seen any mention of search, which seems odd. Google is adding captions to every YouTube video, and nobody is interested in whether you'll be able to search the captions or not? Seems to me like it could be quite useful to search the captions of every video on YouTube.
main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
Just imagine when they hook this up to Google translation and text2speech. You can choose your language for youtube audio.
Wish this technology would be used by TV stations to provide 'sort of' subtitling for programs that don't have any. This could be helpful for deaf/hearing impaired viewers.
Where I live (Netherlands), there's a few public TV channels. Most programs on there are subtitled using a dedicated teletext page (888). For the bulk of commercial channels, there's also subtitles for things like prime time movies, and specific (popular) TV shows. But a lot of it is not, like average day time shows / late night documentaries / commercials etc. etc. This is due to manpower/cost issues: you have a limited audience, a limited percentage of viewers that is deaf/hearing impaired, and (proper) subtitling needs humans. Read money = eating into commercial TV stations' bottom line. It's entirely up to these stations to decide what to subtitle, and what not.
This technology (combined with automated translation) would be a nice complement for those programmes where human-provided subtitling is deemed to expensive. Automated translation is still bad at times, but for deaf/hearing impaired people, subtitles with a bad translation can still be better than no subtitles at all. An automated system shouldn't be very expensive when applied to mass media like national TV, and would be easy to provide for all programmes. And perhaps speech recognition / automated translation would improve over time, to the point where humans aren't needed anymore to get good results.
And the overused buzzword of the day is ...
I'm sure they will improve it dramatically in the coming months and years, but I have not laughed so hard in a while at some of the stuff it comes up with. It's as funny as using a translator to translate a word into Korean and back again.
I agree.
YouTube's CC needs for more work. The English spoken words to English close caption is far off.
http://www.youtube.com/watch?v=VROZ2bbiQLc
The narrator of this video makes no mention of "senate Chris".
Too funny. The translation is totally off.
let's set so double the killer delete select all.
Which is to say, pretty darned feeble. Clever work, but basically rubbish when compared to user expectation.
One of my favourite videos is this one (http://www.youtube.com/watch?v=yYAw79386WI), dating from the '30s, about how differential gears work. The voice-over is that beautifully clear, precise American newsreader accent of the period, and there isn't any background music to confuse things. If anything should be a perfect candidate for a computer to analyse, it's this.
But the captions are worse than I'd expect from off the shelf software like Dragon Dictate, which isn't particular special itself. A perfectly enunciated "road" with a very clear final D, is misheard as "role", for example. There are mistakes in nearly every line, and while sometimes they're obvious, sometimes they're just bizarre.
I'm tempted to say "nice try, good work for a first shot, and hey, it's a beta so it'll get better." But I've been exposed to software dictation software for over a decade, and it just hasn't, really. So I don't think it will, and I don't think most people will get much use out of it, apart from the odd giggle at the YouTube equivalent of "Dear aunt, let's set so double the killer delete select all..."
What I would be interested in hearing is whether this, flawed as it is, is useable enough for a deaf person. In context, you'd probably figure out that "role"="road", but would you guess that "outmoded"="are mounted"? Maybe, maybe not - watch the video on mute with the captions on, and it's kinda tricky but you can get the gist of it. But then I'm reminded that this is the best case video I could find, and most will probably be worse. It'll be interesting to see what the feedback is from deaf people, and whether it really makes a difference, and whether the context makes up for the poor quality. I'd like to hope it might do just that.
Whatever happened to the Berger Liaw speech recognition system? One article (from 11 years ago) is here. It had the ability to track multiple (dozens) of voices simultaneously, could process speech spoken in a continuous stream, and could detect speech in very high noise environments (in some tests, human listeners could only tell what words were being spoken with 50% accuracy, and the voice recognition system could still tell what was being spoken 85% of the time ---very high noise environments like someone speaking in normal room conversation voice, beside a jet engine. The US Navy (submarine service) was a strong advocate of the technology, but I've heard very little about it since.
http://www.youtube.com/results?search_query=buzzword+bingo&search_type=&aq=2&oq=buzzwor seems appropriate here.
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
Could you combine this with the lip reading technology that was introduce to allow "voiceless" cell phone calls? http://www.ubergizmo.com/15/archives/2010/03/lip_reading_technology_unveiled.html Wouldn't that improve the accuracy for those scenes where the speakers mouth is visible?
Or how about using the subtitle tracks that are in a different language and reverse translating them to provide additional clues as to what the speaker might have been saying? It might help a little.
Take a look at this board game review:
http://www.youtube.com/watch?v=Uv6pIFgfa0U
His name (Tom Vasel) appears to be consistently translated "oh come on now". What, don't they believe that's his name? He comes with surprising revelations such as "I'll be your next president" and wonderful nonsense like "but it is a ten-year period deduction gay".
xkcd is not in the sudoers file. This incident will be reported.
Fish sticks or Fish dicks?
Just what we all needed: something dumber than user comments to read on YouTube.
"There is much pleasure to be gained from useless knowledge." - Bertrand Russell.
reads the caption and then produces the video?
I don't think it likes Bono. I was watching a speech by George Bush where he says about Bono: "he is a man of depth and great heart". The caption was: "he is a man of death and great whore"
The first 10 videos I've been to don't include it. Including suggested and front page vids.
Is this a metric most?
This is excellent timing; I clicked on the link to a video on the previous /. story but my sound was not working. I thought, "man, I wish more videos were closed-captioned," not just for lazy people like me but also for the hearing impaired.
Finally it'll be easier for me to share these videos with my deaf and hard-of-hearing friends!
- RG>
Hey pal, this isn't a pleasantforest, so don't waste my time with pleasantries!
I like the "CC" feature... it makes it very simple to do those Hitler Downfall parodies... but I was surprised that I was the first to actually make one using the feature. My video features closed captions for both the original German-to-English translation, and a Lost parody script. I also provide a handy download to a text-editable SRT file so others can make their own (does that make me a bad person?).
The nice thing is that you can add as many subtitle files as you like... and give each of them separate titles. It understands language, so presumably, my parody can be run through translator (on the fly) for any other language. Now, one "blank" can provide hundreds of alternate parodies from one YouTube video.
I just wonder if this "automatic" feature will try and create subtitles on my blank, with subtitles already loaded.
This is why Google rocks ...and M$'s tarnished SilverDimGlow does not.
Srongly wish Netflix would realign themselves to use a youtube-like setup instead, but I strongly suspect M$ either threw them 'an offer they could not refuse', or this will become yet another mutual lock-in, like Intel_M$.
(Really irritated that I cannot, yet, watch Netflix from my Debian machines.)
Uh, Linux geek since 1999.
An interesting upside to all this might be that, if Google keeps the dialog from youtube content in their searchable database, people may soon be able to search for videos via content.
Right now, I believe keywords need to be done, but the auto-captioning would remove that barrier, perhaps.
"Here's looking at you, kid."
Uh, Linux geek since 1999.
Proper subtitling needs humans, but come on, be honest. How much manpower does it actually require to subtitle something?
If its your native language its a matter of timing. Little else. If you're paying someone to be on the clock depending on the length of the program it might take anywhere from 30 minutes to a day for a long program. How much is a day's wages for even the lowest of budget infomercials?
if you're translating, you're probably not translating something new, and that means there are likely already native subs for it. So its simple a matter of translation, not timing. I've seen subbers here in korea fan-sub a 30 minute sitcom fresh off the air, in just a few hours without even public access to english subtitles first.
Does this mean they can now enjoy the 2 Girls and 1 Cup Reaction videos?
http://www.youtube.com/watch?v=ggaWaK5d23Y
This is good news. I've been looking at speech-to-text and audiomining for a while. My goal was not captioning, but search, so in a long video or large set of videos, a user can quickly find snippets of video mentioning a word or phrase, and replay the found snippets. I found a bunch of options but budget was always in issue. Google Audio (Gaudi) was free (cool!) but seemed like a dead-end project after the 2008 elections. Blinx- spinoff from BBN focused on media companies. $$$$$$. Autonomy- enterprise search/monitoring company bought tech from Virage. $$$$$$. Virage- sold their tech to autonomy, then redeveloped it. Coveo- audiomining software using Nuance SDK and Silverlight front end. $$$$$ . TVeyes- does a lot of real-time monitoring. $$$$$. Nexidia- audiomining software uses their own phoneme tools. $$$$$$. Is this YouTube service an incarnation of Gaudi? Either way, it's nice that it's finally out there.
If you're paying someone to be on the clock depending on the length of the program it might take anywhere from 30 minutes to a day for a long program.
If the captioning takes longer than the program, you have to do it in advance. This rules out captioning news, sports, entertainment awards, and other live programs.
If the recognizer isn't sure, perhaps it could use the fact that there are six times as many "fish sticks" as "fish dicks" in Google's web index. I'd bet it already does; there's a reason for the "Markov" in hidden Markov modeling.
Soon (now?) they can generate captions of everything heard (or sung) in a video immediately after upload and match the captions against lyrics and transcriptions of copyrighted works or even just search them for specific keywords. Then they can flag those videos as possible copyright violations or even prevent them from being displayed until after being reviewed by someone.
I'm not saying captioning isn't a good idea, only that it can be used for more than just assisting the hard of hearing.
I tried the video mentioned here, but it just tells me "Captions are not availabel". Strange.
Is it because I'm in Europe?
Because I use Firefox on Linux?
The video mentioned a few posts before that is even weirder: it seems to have captions, I can turn them on, but no captions are displayed.