Slashdot Mirror


Web Scanning Technology for Copyright Violations

eldavojohn writes "I've heard a lot of talk about software being used to detect pirated media anywhere on the web, but haven't seen a lot of details. PhysOrg has a good article on one of the tools out there. Automatic Copyright Infringement Detection (ACID) boasts a patented technology dubbed 'meaning-based computing' that is reportedly capable of finding relationships among 1,000 different types of files. The important thing is that this is not tagging-based searching. 'Autonomy's search technology uses automatic hyperlinking and link clustering that the company claims isn't built into keyword search engines. According to the company, this technology allows computers to perform searches with greater context, so it finds a wider range of related documents or research citations than is possible from keyword searches.' For more details on how this magic works, check out Autonomy's patent and the many patents by its subdivision, Virage."

5 of 54 comments (clear)

  1. Re:Encryption? by updog · · Score: 2, Informative
    And, torrents and newsgroups?

    It really seems to be targeting your typical TV episode uploaded to YouTube...

  2. Fully buzzword compliant by Animats · · Score: 4, Informative

    All those buzzwords. Apparently somebody has a system that can characterize and match images and video. That's reasonable enough, it's been done before, and the question is how good the new one is. The article gives zero help in that direction.

    From the same source: "Nanogenerator provides continuous power by harvesting energy from the environment". It's a variation on the piezoelectric generator concept, like a piezo fire starter.

  3. Re:Won't work by PiEpster · · Score: 5, Informative

    Actually, their technology works exceptionally well, provided you use it in the way it is meant to be used. To use Autonomy for internet spidering is obviously not one of those ways, since its 'meaning-based computing' (read: pattern-recognition) algorithms will turn up text on cats when you were searching for 'dogs' (since they are related terms). People are so used to Google's keyword search that this confuses them utterly.

    However, in a corporate intranet environment, this could be VERY useful for 'knowledge workers' like those working in R&D departments. I've managed an Autonomy system for a large multinational and they were using it for search on their internet and intranet sites. The average internet John Doe was complaining like hell, while the employees in R&D and similar functions were loving it.

    In this case, using it for detecting copyright infringement could actually work, since the pattern-recognition abilities of Autonomy are in fact very good.

  4. Standard Machine Learning... by kripkenstein · · Score: 4, Informative

    If you look at the patents, they're pretty much all about text or metadata searching.
    Indeed, yes. Furthermore, they seem to be a simple list of standard machine learning (text categorization/information retrieval) methods. I won't bother to go through the entire patent, it is mind-numbingly boring, but here are some details for the beginning of it: (I refer to the claim #'s)
    • 1,2: This is the standard TFIDF method. TF means 'text frequency', you give each word a weight equal to its frequency in the document. IDF means 'inverse document frequency', if a word is rare, you give it more weight. Typically this is done with the logarithm, btw.
    • 4,5,6: This is extremely general. But it sounds like any of a myriad of methods to generate 'higher-order-features'. For example, by using a nonlinear kernel function.
    • 7&9: Sounds like a way to measure the importance of a feature. Many such methods are already in use, for example, mutual information (MI).
    • 8: In other words, a 'stoplist'. Nice way to make it sound really complicated and useful, though.
    Skimming the rest of the patent, I don't see much substance. But I admit I didn't go through all of it. Perhaps someone else will have more patience.
  5. It doesn't by blorg · · Score: 4, Informative

    ...just like it doesn't catch you burning a CD and giving it to your friend physically. Or the Scouts singing "Happy Birthday."

    However it may well do what it is designed to do, finding copyright infringement on the web. Autonomy are a serious company working on pattern recognition, not some fly-by-night cowboys. This copyright-finding thing would just be a side application of their core technology.