Slashdot Mirror


NSA Shopping For Data Mining Tech

prostoalex writes "The National Security Agency paid a visit to Silicon Valley venture capitalists, the New York Times learned, to talk about potentially 'interesting' technologies that the Feds would be interested in purchasing. Data mining technologies that could link arbitrary facts into logical events and find dependencies, technologies for quick voice transcription - all these technologies usually get to market faster if developed by private companies."

1 of 159 comments (clear)

  1. Fund the C-Prize by Baldrson · · Score: 0, Offtopic
    The NSA can get what it wants via a compression prize competition. Compressing a corpus must find the most predictive patterns.

    They could fund a prize competition such as the following:

    Let anyone submit an open source program that produces, with no inputs, one of the major natural language corpora as output.

    S = size of uncompressed corpus
    P = size of program outputting the uncompressed corpus
    R = S/P (the compression ratio).

    Award monies in a manner similar to the M-Prize:

    Previous record ratio: R0
    New record ratio: R1=R0+X
    Fund contains: $Z at noon GMT on day of new record
    Winner receives: $Z * (X/(R0+X))

    Compression program and decompression program are made open source.

    Explanation For an idea of why the C-Prize can solve the AI problem, if it is solvable, see Matthew Mahoney's comment on it:

    Matt Mahoney
    Jun 17, 7:18 pm show options
    Newsgroups: comp.compression
    From: "Matt Mahoney"
    Date: 17 Jun 2005 20:18:59 -0700
    Local: Fri, Jun 17 2005 7:18 pm
    Subject: Re: The C-Prize

    Hutter's AIXI, http://www.idsia.ch/~marcus/ai/paixi.htm makes another argument for the connection between compression and AI that is more general than the Turing test. He proves that the optimal behavior of an agent (an interactive system that receives a reward signal from an unknown environment) is to guess that the environement is most likely computed by the shortest possible program that is consistent with the behavior observed so far. In other words, the most likely outcome for any experiment is the one with the simplest explanation, where "simplest" means the smallest program that could model what you currently know about the universe.

    He gives a formal proof, but it basically says that the only possible distribution of the infinite set of programs (or strings) with nonzero probability is one which favors shorter programs over longer ones. Given any string of length n with probability p > 0, there are an infinite set of strings longer than n, but only a finite number of these can have probability higher than p.

    -- Matt Mahoney

    Matt Mahoney is the author of Text Compression as a Test for Artificial Intelligence which states:
    It is shown that optimal text compression is a harder problem thanartificial intelligence as defined by Turing's (1950) imitation game; thus compression ratio on a standard benchmark corpuscould be used as an objective and quantitative alternative test for AI (Mahoney, 1999).
    (Mahoney is also a competitor who has some winnings from The Calgary Corpus Compression Challenge