YouTube's Content Identification Failure Raises Eyebrows

← Back to Stories (view on slashdot.org)

YouTube's Content Identification Failure Raises Eyebrows

Posted by Zonk on Tuesday January 2, 2007 @02:38AM from the making-new-things-is-hard dept.

MSNBC is carrying a story looking at YouTube's failure to follow through with a promised 'content identification system' by the end of the year. The article goes on to discuss the possible impact this failure will have on the site's (so far) good relations with television, music, and movie studios. From the article: "If the delay lasts for more than a week or two into the new year, suggesting more than just a slight technical hitch, 'this is certainly going to be a serious issue', [Mike McGuire, a digital media analyst at Gartner] added. Leading music companies have already made clear they see completion of YouTube's anti-piracy technology as an important step in any closer co-operation. Failure to build adequate systems to protect copyright owners could also add to the risk of legal action against the site."

28 of 109 comments (clear)

Google and Youtube aren't that dumb by Salvance · 2007-01-02 02:41 · Score: 4, Insightful

It's hard to believe that Google hasn't already discussed the delay and any consequences with the movie, television, and music studios. Google had such intensive conversations with them before purchasing YouTube, that it would be silly if they went quiet and just let things slide.

--
Crack - Free with every butt and set of boobs
1. Re:Google and Youtube aren't that dumb by jackharrer · 2007-01-02 02:43 · Score: 2, Interesting
  
  _WE_ don't know if they did or not. This kind of negotiations are usually behind closed doors, and on this level this means vault doors.
  Let's wait for some time and we will know. Any lawsuit - they haven't. Simple.
  
  --
  
  "an experienced, industrious, ambitious, and often, quite often, picturesque liar" - Mark Twain
Easiest code EVAR by Anonymous Coward · 2007-01-02 02:41 · Score: 5, Funny

Here you go guys, this one's on the house:

if (content) {
return "This Youtube content has been identified as: Bad";
}
1. Re:Easiest code EVAR by dimeglio · 2007-01-02 04:39 · Score: 4, Funny
  
  The code is not the problem. Maybe the MPAA was requested to provide the MD5SUM of all the material they object to be published. I suppose they haven't completed this. So it's not necessarily YouTube's fault. ;-)
  
  10 YouTube exec: So what clips exactly do you want us to remove?
  20 MPAA: well all those which we don't want you to publish.
  30 YouTube exec: Ok, which clips exactly do you object to.
  40 MPAA: all those we don't want you to publish.
  50 GOTO 10
  
  --
  Views expressed do not necessarily reflect those of the author.
2. Re:Easiest code EVAR by recursiv · 2007-01-02 04:46 · Score: 2, Funny
  
  I thought of a new algorithm that should be more accurate:
  if (views > LEGIT_VIEWS_THRESHOLD) { return "This Youtube content has been identified as: Illegal"; } else { return "This Youtube content has been identified as: Legal"; }
  
  --
  I used to bulls-eye womp-rats in my pants
3. Re:Easiest code EVAR by Richy_T · 2007-01-02 05:38 · Score: 2, Funny
  
  If I'm not mistaken, you just made that joke.
  
  Rich
4. Re:Easiest code EVAR by ifrag · 2007-01-02 07:16 · Score: 2, Insightful
  
  That's just a joke about the MD5's right? Even the most simple edit to a clip would cause it to change, such as clipping blank frames from the start or end.
  
  --
  Fear is the mind killer.
Relax by Timesprout · 2007-01-02 02:42 · Score: 5, Funny

Its in Beta.

--
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe
This should improve content dramatically by nizo · 2007-01-02 02:42 · Score: 4, Funny

Once all that illegal content is gone, it will make it easier to find things like this.

--
I Am My Own Worst Enemy
If I were google I would be worried by LiquidCoooled · 2007-01-02 02:48 · Score: 5, Interesting

Because once they show that they can identify bad content within video files won't the MPAA/RIAA/* start to bug them about soing the same with normal search results?

Instead of Perfect 10 having to search and list the illegal boobies on display, google will have to automatically remove them from view :(

Won't somebody think of the boobies :(

--
liqbase :: faster than paper
1. Re:If I were google I would be worried by trollingsloth · 2007-01-02 03:00 · Score: 3, Funny
  
  There are no boobies until the third page of results and they aren't even good ones. Please doo some research before you send us on a wild goos chase.
Lawyers Shouldn't Set Tech Deadlines by spike2131 · 2007-01-02 02:57 · Score: 5, Insightful

I pity the developers who are making this product. They have been given a complex task and an arbitrarily chosen deadline, probably pulled out of the air by marketing/legal/upper management. Since September they have been on a death march to meet this date, sacrificing family time around the holiday season.

But you know what? It just ain't ready because it was a fools errand to begin with. My guess is they are working off of half-assed specs that weren't even ready before Thanksgiving. Maybe in a few more months they can have something good. But media partners getting pissy about it isn't going to help the code mature any faster.

--
SpyDock: Scientific Python in a Docker container
1. Re:Lawyers Shouldn't Set Tech Deadlines by Herr+Ziffer · 2007-01-02 03:16 · Score: 3, Insightful
  
  The technology isn't there yet. There are other companies working toward the same goal of media fingerprinting for much longer than YouTube has. For a sufficiently long media clip, it can be done. There serious problem, though, is with smaller clips. 30 seconds just isn't enough material, currently, to get a good match. Add to that the fact that the original clips get resampled and distorted and overdubbed. YouTube may be getting a break from media companies simply "because" it is so easy to make the argument that this was never feasible in the first place.
Is it possible? by ErGalvao · 2007-01-02 03:01 · Score: 5, Interesting

This may sound a little OT - sorry for that - but this story raised an old question here: is it really possible to do an automated content identifier/filter solution? Personally I've always found these kind of solutions full of flaws. Take web surfing filtering for an instance: it's pretty common that the filtering software makes a mistake and end up identifying a "false positive bad content site". After all - google or not - both things follow the same basic principles, right?

--
Er Galvão Abbott - IT Consultant and Developer
1. Re:Is it possible? by Rob86TA · 2007-01-02 03:23 · Score: 3, Insightful
  
  The thing is, the MPAA and etc don't care if there are false positives, they only care that they are no escapes. Youtube could probably deploy a solution that would make the MPAA happy, only to have its own users leave as valid content was always accidently being blocked.
2. Re:Is it possible? by MindStalker · 2007-01-02 04:04 · Score: 3, Insightful
  
  I'm betting they go with a computer/human pair system. If it matches close to 100% to a known video treat it as if it were the known video. If it matches greater than 50% have a human look at it. If it matches less keep it and wait for a user to flag it. Realistically most youtube videos are near carbon copies of other videos on youtube already. This would greatly decrease dups at least.
DMCA by Xymor · 2007-01-02 03:17 · Score: 2, Interesting

Isn't all they need to comply with DMCA a link to allow reporting of DMCA violation/copyrighted protected content and removing of the content once verified?
Enforce That ! by leftcase · 2007-01-02 03:28 · Score: 3, Insightful

Given that the media and entertainment industry has made such a miserable job of enforcing copyright since the emergence high speed internet, perhaps their efforts would be better spent figuring out ways to capitalise on the presence of sites such as youtube and myspace.

If businesses such as Red Hat can make a living from open-source software, surely there's a more refined way for said media businesses to realise capital from their assets without being so 'grabby'!
Solution by jlebrech · 2007-01-02 03:45 · Score: 2, Interesting

The solution would be to perform some sort of hash check against previously taken down material. So actually posting copyrighted material once and having it spotted, would stop it from recurring on the system. It just needs to still match submittions with bits cut out and varying watermarks and source qualities with some kind of identification algorythm. (similar to fingerprinting)
Something I noticed with Google Video by shotgunefx · 2007-01-02 03:52 · Score: 4, Informative

Only one video I ever uploaded was not posted immediately. It was a demonstration of a touchscreen media player I'm working on (Was one of a couple vids I uploaded that night). I was playing copyrighted material in the demo, but no song played very long before moving on and the audio (as it was off camcorder) was horrible.

About 12 hours later, it cleared. Fairly certain it was flagged and reviewed. If that's the compromise, I think I could deal with that.

--

-William Shatner can be neither created nor destroyed.
No, it's not possible. by twitter · 2007-01-02 04:02 · Score: 2, Insightful

... is it really possible to do an automated content identifier/filter solution?

To take away your fair use they would have to fingerprint both the audio and video content. That's possible for whole works at a given frame size, rate and audio quality. Already, you can see the problem because there's an almost unlimited choice of those. Couple that problem to every length variation and you have an impossible task for any single work. The database of fingerprints would be infinitely large. You can multiply this infinite sized database time the hundreds of thousands of works the crackpots want to "protect" for a result thats that many times less practical. Policing for original works based on someone else's "intellectual property," such as a Star Wars parody, is clearly impossible. The already impractical task of making fingerprints of each submission is trivial by comparison. Even if they could fingerprint all submissions, there is no way they can match it to their satisfaction. Policing will require AI or a human inspector because the "crime" is sharing the details of a story, something only a person can recognize. If they do make it work, the first thing it will do is point to the blatant theft of concepts by every movie ever made, such as Star War's liberal use of "Triumph of Will", "Forbidden Planet" and several WWII films.

--
Friends don't help friends install M$ junk.
1. Re:No, it's not possible. by Gulik · 2007-01-02 05:17 · Score: 2, Interesting
  
  I don't know -- IANAM (I Am Not A Mathematician), but it sounds like an exceedingly difficult problem. To fingerprint a video, you're going to have to use specific information from it, and I don't know what information will remain constant between different encoding qualities and even encoding applications using the same theoretical quality. I assume you have to fuzz it up (the mathematical equivalent of "this area of the image from time index X to time index X+2 is reddish-orange, and this other area during this other time range is pinky-russet"), and that will result in false positives. And, if the algorithm is publicly known, people will mess with the areas that are known to be used in the fingerprinting to cause false negatives. Or just mess slightly with the saturation or intensity of the entire video, if they don't know the precise locations but do know that this is the kind of thing the algorithm checks. And with the number of people hammering at it, I don't expect the rough workings of the algorithm to remain entirely secret for long. And then it's a footrace like with Google's ranking algorithms, with lots of folks working to figure out how to beat it, Google improving it, and hackers having at it again.
  
  In any case, I shall watch developments with much interest.
2. Re:No, it's not possible. by Jerf · 2007-01-02 06:41 · Score: 2, Insightful
  
  I Am Not A Mathematician either but I'm closer than the vast majority of people on Slashdot. (I've studied this stuff in a formal setting and done some limited work in the field of handling wildly multidimensional data.)
  
  This reply is much more reasonable, and much closer to the truth. One of the missing pieces of your first post is the problem of making attacker-resistant fingerprints. Fingerprinting is actually not so hard when you haven't got people actively trying to hurt the fingerprint and you can accept a reasonable (and small) rate of false positives. It's not even that hard to make it fairly stable under certain easy transforms like a volume modification.
  
  Making it attacker-resistant is as hard as you say; it's not that a fingerprint function can't be created for each of the attacks you mention, it's that covering them all at once is hard. The easiest thing to do is simply make the fingerprints cover more stuff ("fuzzing" the fingerprint is a pretty good mental model), which definitely increases the false-positive rate on audio. (Video doesn't suffer from this quite so badly because it has much more data to work with, therefore videos are "farther apart", and can tolerate much more "fuzzing". The flip side is dealing with this extra data can be a pain and it does open up some other attack avenues.)
3. Re:No, it's not possible. by jlarocco · 2007-01-02 13:17 · Score: 3, Informative
  
  You have something better? MD5 is the easiest computationally and produces the smallest result to store, using other techniques will increase the size of your database and computational expense.
  
  There's no way they could use MD5. MD5 hashes are designed to return the same value given the same input, and a totally different value for even a slight modification of the input. Or in other words, md5("ABCD") is nothing at all like md5("ABCE"). Given the nature of audio and video, it would be trivial to bypass an MD5 copyright check. Change a single pixel in a single frame from RGB(255,255,255) to RGB(255,255,254) and nobody would notice, and it'd get through the check.
  
  --
  Maybe not
Re:Why should we help the content providers? by b0s0z0ku · 2007-01-02 04:05 · Score: 2, Funny

If anything, I'd patent to to keep it from being used.
Better yet, patent it and send all royalties to the EFF. The "industry" can only use it at "their own expense" - in more ways than one :)
-b.
Re:MSN Soapbox (Private Beta) by KDR_11k · 2007-01-02 04:12 · Score: 2, Funny

Just wait until Web 3.11 for Workgroups.

--
Justice is the sheep getting arrested while an impartial judge declares the vote void.
Re:It's all Utube Has by symbolic · 2007-01-02 04:25 · Score: 3, Insightful

I totally disagree. I rarely pay any attention to the copyrighted stuff, because that's exactly what I'm trying to get away from. The only way that I'd agree with you relates to situations where someone has used a copyrighted work to produce something derivative - like a spoof of a music video, or some music in a home-made video trailer.

Youtubs is a threat - I don't think it's a threat because people use copyrighted material in this manner, it's a threat because it moves the entertainment decision-making process from the few that used to have nearly complete control, to the end user. It's another paradigm shift that will be fought tooth and nail by the old guard.
Only Possible in Vista. by twitter · 2007-01-02 07:19 · Score: 2, Funny

The easiest thing to do is simply make the fingerprints cover more stuff ("fuzzing" the fingerprint is a pretty good mental model), which definitely increases the false-positive rate on audio.

I would have thought the easiest thing to do would be to take the Vista approach: all video will be reduced to a 2x2 pixel screen size. Content will easy to identify that way, because it will all look the same.

--
Friends don't help friends install M$ junk.