Web Scanning Technology for Copyright Violations
eldavojohn writes "I've heard a lot of talk about software being used to detect pirated media anywhere on the web, but haven't seen a lot of details. PhysOrg has a good article on one of the tools out there. Automatic Copyright Infringement Detection (ACID) boasts a patented technology dubbed 'meaning-based computing' that is reportedly capable of finding relationships among 1,000 different types of files. The important thing is that this is not tagging-based searching. 'Autonomy's search technology uses automatic hyperlinking and link clustering that the company claims isn't built into keyword search engines. According to the company, this technology allows computers to perform searches with greater context, so it finds a wider range of related documents or research citations than is possible from keyword searches.' For more details on how this magic works, check out Autonomy's patent and the many patents by its subdivision, Virage."
This technology sounds like it's stuck behind the buzzword "meaning-based media," which seems to just be an abstract notion of finding and sorting media without profiling, hashing, fingerprinting, tagging, watermarking, sourcing, or naming (in other words, by going on bullshit notions and intuition. "Oh, it looks copyrighted.")
More importantly, it looks like it can't do anything unless the target is somewhere on the Web and is reasonably active. The darknets and private trackers are still safe.
~ C.
Sure, they have a patent, and if they actually implement what's in the patent it's meaningful to look at... but more often than not, the patent is much broader than the actual application, or the patent isn't even being used.
If I looked at patents to determine what a business was capable of, I would be driving a car that gets 100's of miles to the gallon!
Sometimes the best solution is to stop wasting time looking for an easy solution.
Not to complain about the article too much, but is there anyone out there who didn't find it completely contradictory and useless?
As far as I can tell, the article starts off by saying that they have a wonderful system to inspect and compare the video content of a clip against a HUGE database (eg. tens of thousands of hours of copyrighted movies, TV series, music). And, that they know how to read _any_ media format (eg. an AVI using some particular codec embedded into a Word document which is zipped....) The suggestion is that the software could "read" a Youtube video clip, and recognize that it contains a few minutes of a Jay Leno monologue. Needless to say, they don't explain how they might possibly do this - because, as far as I can tell, they can't. Not even close.
If you look at the patents, they're pretty much all about text or metadata searching. For example, they seem to have found an innovative way to find keywords to categorize a document....by scanning for words in the document! Or of categorizing a video file...by looking at metadata (eg. comments) embedded in the file. The only amazing thing about these algorithms is that some dimbulb in the patent office decided to give them a 20 year monopoly on something people have been doing for decades.
Back in the early days of cars, most folks thought the red flag act was entirely justified.
Sorry, but we've hit a new age of abundance. With the overwhelming percentage of internet users using LimeWire, BitTorrent etc, attempts to sustain a manufactured scarcity in the face of this abundance will similarly fade away into obsolescence.
The copyright enforcement versus piracy arms race will make for interesting history courses in future decades. I can see the courses now - "The Rise And Fall Of Intellectual Property".
I'm looking forward to blowing my grandkids' minds when I tell them about the era when information wasn't free.
-- In the beginning was the WORD, and the WORD was UNSIGNED, and the main(){} was without form and void...
Seems every week some company comes up with a way to detect copyright violations or terrorists or naughty pictures or some other buzzworthy topic that will get them paid suitcases full of money.
Until I see some sort of evidence that they can do it, I rank the claims along with those who claim that they can tell what people are thinking by where they scratch.
"Trademarks are the heraldry of the new feudalism."
This kind of detection is difficult if not impossible, as others posed, what if the copy is encrypted? or what if it is altered to make it difficult to find even using complex Image Processing algorithms? these algorithms may fail to detect it as a copy even if it has something like a 10% shift in hue or saturation, same can happen with video, will this system detect if i copy a video and change the color tones from full color to sepia?
- Yes, but does it run Lunix?
Whenever I see words "intelligence", "meaning", or "understanding" used to describe software, that's how I know it's a bunch of baloney.