Slashdot Mirror


Web Scanning Technology for Copyright Violations

eldavojohn writes "I've heard a lot of talk about software being used to detect pirated media anywhere on the web, but haven't seen a lot of details. PhysOrg has a good article on one of the tools out there. Automatic Copyright Infringement Detection (ACID) boasts a patented technology dubbed 'meaning-based computing' that is reportedly capable of finding relationships among 1,000 different types of files. The important thing is that this is not tagging-based searching. 'Autonomy's search technology uses automatic hyperlinking and link clustering that the company claims isn't built into keyword search engines. According to the company, this technology allows computers to perform searches with greater context, so it finds a wider range of related documents or research citations than is possible from keyword searches.' For more details on how this magic works, check out Autonomy's patent and the many patents by its subdivision, Virage."

16 of 54 comments (clear)

  1. Re:Encryption? by updog · · Score: 2, Informative
    And, torrents and newsgroups?

    It really seems to be targeting your typical TV episode uploaded to YouTube...

  2. Thank God for Darknets... by MostAwesomeDude · · Score: 3, Insightful

    This technology sounds like it's stuck behind the buzzword "meaning-based media," which seems to just be an abstract notion of finding and sorting media without profiling, hashing, fingerprinting, tagging, watermarking, sourcing, or naming (in other words, by going on bullshit notions and intuition. "Oh, it looks copyrighted.")

    More importantly, it looks like it can't do anything unless the target is somewhere on the Web and is reasonably active. The darknets and private trackers are still safe.

    --
    ~ C.
    1. Re:Thank God for Darknets... by donaldm · · Score: 2, Interesting

      The actual patent reads like a maths paper with lots of buzzwords. Sorry I try not to to read too much of the patent since the Legal Jargon actually gives me a headache. Maybe that is intentional for all patents. What annoys me is this patent is not really an invention since it defines how their software does something which is not even physical. I suppose the physical aspect occurs when someone is taken to court.

      Please note I am against software patents in general although I am not against closed source or copyright and trademarks although these can also be a "can of worms". As far as I am concerned this should never be granted as a patent since it is another thing that takes away freedom in programming or even the basic human thinking process. Still if you have money and Patent lawyers on retention I suppose you could patent anything like http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=P TO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2F srchnum.html&r=1&f=G&l=50&s1=%2220040230959%22.PGN R.&OS=DN/20040230959&RS=DN/20040230959

      --
      There ain't no such thing as proprietary standards only proprietary formats. Standards are by definition open.
    2. Re:Thank God for Darknets... by P3NIS_CLEAVER · · Score: 3, Interesting

      These jokers were trying to get us to sell their desktop search engine to our clients about 5 years ago. IMO they were pretty overstuffed and FOS. (how is that for buzzwords)
      I am surprised they survived the internet bubble (or lack of)

      --
      Please sign petition to restore sanity to our banking system!!!

      http://financialpetition.org/
  3. Like a patent means anything by jhfry · · Score: 2, Insightful

    Sure, they have a patent, and if they actually implement what's in the patent it's meaningful to look at... but more often than not, the patent is much broader than the actual application, or the patent isn't even being used.

    If I looked at patents to determine what a business was capable of, I would be driving a car that gets 100's of miles to the gallon!

    --
    Sometimes the best solution is to stop wasting time looking for an easy solution.
  4. AI by alphamugwump · · Score: 5, Interesting

    I find it ironic how stuff like this ends up being the among the more practical applications for AI. I mean, science fiction is usually about robots taking over. Instead, we end up with an internet full of bots trying to sell viagra, bots trying to block viagra, bots trying to break captchas, bots trying to detect copyright infringement, p2p systems to insure privacy, and so on.

    I don't think this sort of searching for pirated content is going to be terribly effective, though. I mean, it might be able to catch the blatant stuff like youtube, but ultimately, they're never going to kill p2p, especially once private trackers become more common.

  5. Fully buzzword compliant by Animats · · Score: 4, Informative

    All those buzzwords. Apparently somebody has a system that can characterize and match images and video. That's reasonable enough, it's been done before, and the question is how good the new one is. The article gives zero help in that direction.

    From the same source: "Nanogenerator provides continuous power by harvesting energy from the environment". It's a variation on the piezoelectric generator concept, like a piezo fire starter.

  6. Huh? by Anonymous Coward · · Score: 5, Insightful

    Not to complain about the article too much, but is there anyone out there who didn't find it completely contradictory and useless?

    As far as I can tell, the article starts off by saying that they have a wonderful system to inspect and compare the video content of a clip against a HUGE database (eg. tens of thousands of hours of copyrighted movies, TV series, music). And, that they know how to read _any_ media format (eg. an AVI using some particular codec embedded into a Word document which is zipped....) The suggestion is that the software could "read" a Youtube video clip, and recognize that it contains a few minutes of a Jay Leno monologue. Needless to say, they don't explain how they might possibly do this - because, as far as I can tell, they can't. Not even close.

    If you look at the patents, they're pretty much all about text or metadata searching. For example, they seem to have found an innovative way to find keywords to categorize a document....by scanning for words in the document! Or of categorizing a video file...by looking at metadata (eg. comments) embedded in the file. The only amazing thing about these algorithms is that some dimbulb in the patent office decided to give them a 20 year monopoly on something people have been doing for decades.

  7. Hold on, my company has a patent on this by Anonymous Coward · · Score: 2, Funny

    Did their software detect the patent that it is infringing upon? Bastards!

  8. It's a hopeless pursuit by heretic108 · · Score: 4, Insightful

    Back in the early days of cars, most folks thought the red flag act was entirely justified.

    Sorry, but we've hit a new age of abundance. With the overwhelming percentage of internet users using LimeWire, BitTorrent etc, attempts to sustain a manufactured scarcity in the face of this abundance will similarly fade away into obsolescence.

    The copyright enforcement versus piracy arms race will make for interesting history courses in future decades. I can see the courses now - "The Rise And Fall Of Intellectual Property".

    I'm looking forward to blowing my grandkids' minds when I tell them about the era when information wasn't free.

    --
    -- In the beginning was the WORD, and the WORD was UNSIGNED, and the main(){} was without form and void...
  9. No better than a dowsing rod by Black+Art · · Score: 2, Insightful

    Seems every week some company comes up with a way to detect copyright violations or terrorists or naughty pictures or some other buzzworthy topic that will get them paid suitcases full of money.

    Until I see some sort of evidence that they can do it, I rank the claims along with those who claim that they can tell what people are thinking by where they scratch.

    --
    "Trademarks are the heraldry of the new feudalism."
  10. Re:Won't work by PiEpster · · Score: 5, Informative

    Actually, their technology works exceptionally well, provided you use it in the way it is meant to be used. To use Autonomy for internet spidering is obviously not one of those ways, since its 'meaning-based computing' (read: pattern-recognition) algorithms will turn up text on cats when you were searching for 'dogs' (since they are related terms). People are so used to Google's keyword search that this confuses them utterly.

    However, in a corporate intranet environment, this could be VERY useful for 'knowledge workers' like those working in R&D departments. I've managed an Autonomy system for a large multinational and they were using it for search on their internet and intranet sites. The average internet John Doe was complaining like hell, while the employees in R&D and similar functions were loving it.

    In this case, using it for detecting copyright infringement could actually work, since the pattern-recognition abilities of Autonomy are in fact very good.

  11. Standard Machine Learning... by kripkenstein · · Score: 4, Informative

    If you look at the patents, they're pretty much all about text or metadata searching.
    Indeed, yes. Furthermore, they seem to be a simple list of standard machine learning (text categorization/information retrieval) methods. I won't bother to go through the entire patent, it is mind-numbingly boring, but here are some details for the beginning of it: (I refer to the claim #'s)
    • 1,2: This is the standard TFIDF method. TF means 'text frequency', you give each word a weight equal to its frequency in the document. IDF means 'inverse document frequency', if a word is rare, you give it more weight. Typically this is done with the logarithm, btw.
    • 4,5,6: This is extremely general. But it sounds like any of a myriad of methods to generate 'higher-order-features'. For example, by using a nonlinear kernel function.
    • 7&9: Sounds like a way to measure the importance of a feature. Many such methods are already in use, for example, mutual information (MI).
    • 8: In other words, a 'stoplist'. Nice way to make it sound really complicated and useful, though.
    Skimming the rest of the patent, I don't see much substance. But I admit I didn't go through all of it. Perhaps someone else will have more patience.
  12. It doesn't by blorg · · Score: 4, Informative

    ...just like it doesn't catch you burning a CD and giving it to your friend physically. Or the Scouts singing "Happy Birthday."

    However it may well do what it is designed to do, finding copyright infringement on the web. Autonomy are a serious company working on pattern recognition, not some fly-by-night cowboys. This copyright-finding thing would just be a side application of their core technology.

  13. Finding reverse plagiarism by G4from128k · · Score: 3, Interesting

    Publishers using this tool will presume that any found copies are infringing examples of copyright violation. But what happens when a work "created" and copyrighted in 2006 turns out to be "infringed" by something created in 2000? If the pubisher's "original" copyrighted work turns out to not be so original after all, then things could get sticky. I wonder how many cases of plagiarism will be uncovered in which the publisher/copyright holder becomes the defendant.

    --
    Two wrongs don't make a right, but three lefts do.
  14. "Meaninig-based computing" by Venik · · Score: 2, Insightful

    Whenever I see words "intelligence", "meaning", or "understanding" used to describe software, that's how I know it's a bunch of baloney.