hutter1.net · Domains · Slashdot Mirror

Re:Of course it didn't work by ezdiy · 2017-06-08 14:23 · Score: 2 · on Ask Slashdot: What Is Your View On Sloot Compression? (youtube.com)

> Information theory stablishes what is really possible

Actually, information theory states the opposite. Determining entropy of unknown source is an intractable problem, and you can't generally state amount of entropy for piece of data unless you're certain it's a quantum pink noise beforehand, all we know that the better the compressor, the closer you get. That's why one time pads use truly random codebooks, not a PRNG (PRNG has very little entropy - that of PRNG seed).

While extremely important as an output filter, just an entropy encoder doesn't compression make.

Re:How does it sound? by ezdiy · 2017-01-13 15:17 · Score: 2 · on Open Source Codec Encodes Voice Into Only 700 Bits Per Second (rowetel.com)

Look at the codec diagram - if you ignore the entropy coder, it largely resembles input filters of voicerecog systems - before feeding the NN input terminals, signal is decimated to extremely low bandwidth vectors with only the psychoacoustic essentials of human voice - quantized to very few dominating tones and their attack/release values. The NN model does the final step of "compressing" the result only by factor of around 100 into text. It is popularly conjenctured that compression is, in fact, a ML problem.

Same is done with computer vision, before matching for features, the frequency space is filtered into a narrow band where the interesting stuff can be still observed.

Artificial Intelligence-Based Education by Baldrson · 2016-06-04 07:55 · Score: 1 · on Tech CEOs Declare This the Era of Artificial Intelligence (fortune.com)

I keep coming back to natural language compression prizes. The best hope we have of ameliorating human stupidity and ignorance is computer based education starting with a _neutral_ electronic genius with astronomical verbal intelligence. Verbal intelligence entails the ability to assess the verbal and cognitive character of your audience and modify your speech acts accordingly. The cost of electricity -- about 10 cents per kilowatt hour -- would be vastly lower than the cost of transferring benevolent _natural_ geniuses with high verbal intelligence into educational roles. Moreover, the exponential character of Moore's Law, combined with the history of bad general artificial intelligence theory that is finally giving way to mathematical rigor, offers an enormous potential for computer aided education in the near future -- if natural language compression is seen as the critical metric for "friendly AI" it is under such rigor. http://prize.hutter1.net/

Rigorous Criterion for AI Prize by Baldrson · 2015-12-29 10:22 · Score: 1 · on Interviews: Ask Ray Kurzweil a question

Have you considered the utility of a compression-based AI prize for not only advancing machine learning, but also redressing information sabotage? Since Google DeepMind cofounder, Shane Legg, demonstrated the utility of a mathematically rigorous measure of problem-solving intelligence, which is based on Hutter's provably "optimal agent", Universal Algorithmic Intelligence, it seems time for an update of The Hutter Prize for Lossless Compression of Human Knowledge in two way tos: 1) a much larger knowledge base and 2) correspondingly much larger prize endowment. As such a prize pays only in proportion to rigorously measurable progress, and that progress is made public in the form of the refinement of knowledge, it would be a low risk public good appropriate for public sector as well as NGO endowment.

Expand the Hutter Prize by Baldrson · 2015-09-17 08:14 · Score: 1 · on XPRIZE's Jono Bacon On the Next Great Challenge

Expand the Hutter Prize for Lossless Compression of Human Knowledge to include the entire edit history of Wikipedia as well as the entirety of Wikipedia's current contents.

Why?

Because it solves the artificial intelligence problem and does so in a way that optimally enables natural language communication of the accumulated knowledge of humanity.

What I mean by "optimally enables natural language communication" is what every professional writer uses as the first rule of composition:

Write to your audience.

In other words, let's say you are attempting to write an article about quantum mechanics and your audience is a 12 year old from New Jersey, raised without a father in an impoverished, crime-ridden neighborhood. This is a very different composition task than communicating quantum mechanics to a college educated liberal arts graduate from Iowa who is considering a career in accounting. Indeed, it is the essence of pedagogy -- universalized.

By including the entire edit history of Wikipedia, the worldviews, perspectives, biases and agendas of a large number of editors will provide insight into the cognitive as well as social structure of a wide array of humans.

Moreover, while Google and companies like it are increasingly casting their role as "publishers" with the "right" to "editorialize" their search results, the Hutter Prize has a mathematical objective function that is simply not subject to editorialization: Kolmogorov Complexity. KC is a rigorous definition of Ockham's Razor that is mathematically sound and provably an optimal measure of mastery of knowledge.

System Development Foundation by Baldrson · 2015-02-26 09:40 · Score: 1 · on The Believers: Behind the Rise of Neural Nets

Its "System Development Foundation" not "System Development Corporation" and Charlie's full name is Charles Sinclair Smith. He's semi-retired now and living the next county over from me in southeast Iowa where we've been collaborating on a couple of projects -- one of which is to photosynthesize all of the CO2 effluent from US fossil fuel power plants (as Charlie got his start co-founding the Energy Information Administration of the DoE under Carter).

Its ironic that in the 80s I was living in La Jolla, which was an epicenter of the neural net revival at UCSD, had taken neural net courses from Robert Hecht-Nielsen and by 1990 had prototyped the highest performance neural network image processing system (as Neural Engines Corporation) -- but I then later worked with Charlie for almost 15 years before discovering he had had played such a key role in the revival of neural nets. Even more ironic is that, circa 2005, I came up with the idea for the Hutter Prize for Lossless Compression of Human Knowledge -- based on Hutter's entirely different, top down mathematics approach to AI -- and Shane Legg, founder of Deep Mind, which is largely identified with deep learning neural nets, actuality studied under Hutter and achieved Deep Mind's famous ability to learn to play video games using Hutter's approach but everyone thinks that capability is uniquely attributable to deep neural net learning alone.

System Development Foundation by Baldrson · 2015-02-26 09:40 · Score: 1 · on The Believers: Behind the Rise of Neural Nets

Its "System Development Foundation" not "System Development Corporation" and Charlie's full name is Charles Sinclair Smith. He's semi-retired now and living the next county over from me in southeast Iowa where we've been collaborating on a couple of projects -- one of which is to photosynthesize all of the CO2 effluent from US fossil fuel power plants (as Charlie got his start co-founding the Energy Information Administration of the DoE under Carter).

Its ironic that in the 80s I was living in La Jolla, which was an epicenter of the neural net revival at UCSD, had taken neural net courses from Robert Hecht-Nielsen and by 1990 had prototyped the highest performance neural network image processing system (as Neural Engines Corporation) -- but I then later worked with Charlie for almost 15 years before discovering he had had played such a key role in the revival of neural nets. Even more ironic is that, circa 2005, I came up with the idea for the Hutter Prize for Lossless Compression of Human Knowledge -- based on Hutter's entirely different, top down mathematics approach to AI -- and Shane Legg, founder of Deep Mind, which is largely identified with deep learning neural nets, actuality studied under Hutter and achieved Deep Mind's famous ability to learn to play video games using Hutter's approach but everyone thinks that capability is uniquely attributable to deep neural net learning alone.

Deep Mind's IQ Test Works by Baldrson · 2015-02-08 17:11 · Score: 1 · on Replacing the Turing Test

A rigorous definition of general intelligence now exists and has been applied by the Deep Mind folks. See this video lecture by Deep Mind's Shane Legg at Singularity Summit 2010 on a new metric for measuring machine intelligence.

If you want something more accessible to the general public, The Hutter Prize for Lossless Compression of Human Knowledge has the same theoretic basis as the test used by Deep Mind and has the virtue that it uses a natural language criterion, in the form of a Wikipedia snapshot. If the 100M snapshot of Wikipedia used by the Hutter Prize is no longer challenging enough, then substitute Matt Mahoney's Large Text Compression Benchmark which is basically just the Hutter Prize enlarged by an order of magnitude.

Re:I can answer that, Alex! by Baldrson · 2013-06-08 05:51 · Score: 1 · on When Will My Computer Understand Me?

Okian Warrior asserts: "There is no formal definition of intelligence, and no roadmap for what to study"

Yes there is. It's defined by a field called "Universal Artificial Intelligence" and the roadmap says what to study.

Its not winning the Hutter Prize by Baldrson · 2013-05-07 12:55 · Score: 3, Informative · on The New AI: Where Neuroscience and Artificial Intelligence Meet

The claim that "winning both industrial and academic data competitions with minimal effort" might be more impressive if it included the only provably rigorous test of general intelligence:

The Hutter Prize for Lossless Compression of Human Knowledge

The last time anyone improved on that benchmark was 2009.

The Hutter Prize by Anonymous Coward · 2012-09-19 04:09 · Score: 0 · on Ask Slashdot: Where Should a Geek's Charitable Donations Go?

The Hutter Prize for Lossless Compression of Human Knowledge, if sufficiently funded, will get adolescent competitive hormones kicking in to teach them about programming, artificial intelligence and the nature of knowledge itself.

Re:Show me the runny by Internalist · 2010-01-27 18:50 · Score: 2, Informative · on Can Curiosity Be Programmed?

No, he knows and has explicitly stated in a few places that it's uncomputable, in much the same way that Kolmogorov Complexity is uncomputable, but an interesting and potentially useful theoretical construct, nonetheless.

This vein of Schmidhüber's work is more or less descended from Solomonoff's work on induction and Chaitin's Algorithmic Information Theory stuff (the line of descent is less explicit with the latter), and a bunch of Schmidhüber's descendents, most prominently his student Marcus Hutter and *his* student Shane Legg have taken this ball and run with it in interesting ways.

Re:Try the Hutter Prize model by Baldrson · 2009-09-24 03:30 · Score: 1 · on BellKor Wins Netflix $1 Million By 20 Minutes

The point of the Turing test is to model human intelligence. That is not the point of the Hutter Prize. The point of the Hutter Prize is to model optimal intelligence. Human intelligence is not optimal. The target of this intelligence is chosen as human knowledge as represented in Wikipedia. Optimal, or universal, intelligence is a field of pure mathematics: The goal is to mathematically define a unique model superior to any other model in any environment. From a presentation by Marcus Hutter:

The (optimal) AI model is unique in the sense that it has no parameters which could be adjusted to the actual environment in which it is used.
In this first step toward a universal theory we are not interested in computational aspects.
Nevertheless, we are interested in maximizing a utility function, which means to learn in as minimal number of cycles as possible. The interaction cycle is the basic unit, not the computation time per unit.

Some confusion arises due to the fact that optimal compression algorithms rely on optimal prior knowledge of the nature of the input. Optimally compressed prior knowledge is a better "ontology" for predicting, hence compressing, further information coming in from the same environment.

Universal intelligence is not computable, although there is an order 2^N approximation.

Kernel Compression Prize Competition by Baldrson · 2009-09-22 03:06 · Score: 2, Interesting · on According to Linus, Linux Is "Bloated"

Set up a prize competition for kernel compression similar to the Hutter Prize for Lossless Compression of Human Knowledge except the objective is the produce an executable binary of minimum size that expands into a fully functional kernel.

The goal of this competition would be to obtain the optimal factoring of the kernel architecture.

Try the Hutter Prize model by Baldrson · 2009-09-22 03:01 · Score: 1 · on BellKor Wins Netflix $1 Million By 20 Minutes

The Hutter Prize's incremental prize awards for progress, itself modeled on the M-Prize, is a superior way of awarding prize money. There is continual reward for teams that contribute substantially and no one team takes everything based on a technicality.

Re:He's too close. by DriedClexler · 2009-07-30 12:48 · Score: 1 · on A.I. Developer Challenges Pro-Human Bias

Well put, and I agree. I would add that what we actually want out of artificial systems is some kind of combination of survivability and intelligence, and we don't want to go too far in either direction.

"Too much survivability" would be where we can't shut the system down when it's not doing what we want it to, or being destructive. Too little survivability would be where the resources to keep it going exceed the benefit of the output it gives us.

Now, how can you get too much intelligence? Well, if you take intelligence to mean "extracting the most knowledge from the least data", then an optimally intelligent system would be the one that updates its "probability distribution" over the world exactly as its limited observations suggest. However, this would needlessly discard all of the knowledge we already have embedded in our bodies as a result of our long evolutionary history. Many things that we do to survive rely on such implicit knowledge.

In other words, we make good guesses that can't be justified based on what we consciously know, but "happen" to be right for this planet and this universe -- the very things a merely "intelligent" system would try to avoid. An example of a superintelligent system is Marcus Hutter's AIXI, which makes provably optimal inferences, but which takes way too long to do anything useful, because it has to re-learn everything starting from nothing but Occam's Razor.

Anyone for a General AI prize? by Baldrson · 2009-03-26 10:15 · Score: 1 · on Is Your IM Buddy Really a Computer?

Matt Mahoney to Hutter show details 9:33 AM (7 hours ago)

I have uploaded a mirror of Alexander Ratushnyak's new submission to the Hutter prize to http://cs.fit.edu/~mmahoney/compression/text.html#1323 It is in the paq8hp12 section. Scroll down to the bottom of the list of versions just above the table. The submission is decomp8.zip which contains 2 files, decomp8.exe and archive8.bin, the decompressor and compressed file. There is no compressor. To decompress:

decomp8 archive8.bin enwik8

The direct link is http://cs.fit.edu/~mmahoney/compression/decomp8.zip Decompression took about 2 hours on my computer and used a little over 924 MB memory. The total size of the 2 files is 15,986,677 which passes the 3% threshold improvement from his previous submission of 16,481,655 bytes on May 14, 2007.

The submission was Mar. 23. The 30 day comment period before awarding the prize ends Apr. 22, 2009.

Kolmogorov Programming by Baldrson · 2009-01-28 08:08 · Score: 1 · on Less Is Moore

If I were in Ray Ozzie's shoes I would apply something like the The Hutter Prize for Lossless Compression of Human Knowledge to the entirety of MS's software suite. This, of course, requires making a rigorous spec for testing purposes.

Make the engine, upon which the winning succinct byte code runs, a new W3C standard browser programming language (or at least virtual machine) and reduce the Microsoft OS CD to those components required to create a web-delivered application platform using the winning engine. Such an engine would, of course, have some features that dynamically encached expansions (and/or "memoizations") similar to the Hotspot optimization technology that originated with the Self programming language (and was later adopted by Sun's Java Virtual Machine). Hence it would make sense to have the OS CD contain a partially pre-expanded/optimized code base.

Then, for delivery of software services to pre-existing platforms, create a legacy port of the services code to pre-existing W3C standards like XForms implemented in a downloadable ECMAScript Client/SOA library in a manner similar to the way TIBET(tm) does. The idea is to go "Live", ie: web-delivered, with a fundamentally new W3C base (whatever engine won the prize) but support legacy W3C environments for migration.

Again, this prize-oriented strategy would, of course, require a rigorous specification of the software services so the testing could be largely automated.

This approach addresses Microsoft's 2 biggest problems deriving from the same fundamental reality: Everyone has needed their OS to interoperate with the bulk of the information industry.

The first problem is ethical and really goes beyond the scope of my professional opinions to my public opinions about the support of property rights. Suffice to say, I have no trouble with someone who goes after a natural monopoly position and succeeds. I have a problem with someone who then refuses to use that position of success to fix the bug in the society that made them inordinately rich and their technology inordinately influential.

The second problem is technical, which is what my argument here is really all about.

Basically Microsoft's code bloat problem derives from its monopoly position. This may seem like a truism since all of the software "profession" suffers from code bloat, but only Microsoft can take this to monopolistic proportions -- proportions that make Ma Bell's monopolistic complexities of yore look Spartan.

So Microsoft has this problem and it has many programmers (contributing to the code-bloat problem). It also has mountains of cash.

So how can Microsoft bust its own monopoly position turning its many programmers (many newly laid off!) and mountains of cash into succinct code?

Monetary Incentives for the Programmers. For example, the original idea for the Hutter Prize was:

S = size of uncompressed corpus
P = size of program outputting the uncompressed corpus
R = S/P (the compression ratio).

Award monies in a manner similar to the M-Prize:

Previous record ratio: R0
New record ratio: R1=R0+X

Fund contains: $Z at the time of the new record
Winner receives: $Z * (X/(R0+X))

Something similar can be done with the size of the binary that passes the entire suite of tests for Microsoft's software suite.

What happens very rapidly is the programmers first apply their skills to maximally refactoring. What falls out is a series of legacy API layers written atop a tight core.

They'd have to spend more money on code testing to verify the compressed code-bases of the competing teams actually worked to spec but the results should be quite gratifying.

Microsoft's Problem by Baldrson · 2008-06-29 14:55 · Score: 3, Interesting · on Fresh Air For Windows?

If I were in Ray Ozzie's shoes I would apply something like the The Hutter Prize for Lossless Compression of Human Knowledge to the entirety of MS's software services suite. This, of course, requires making a rigorous spec for testing purposes.

Make the engine, upon which the winning succinct byte code runs, a new W3C standard browser programming language (or at least virtual machine) and reduce the Microsoft OS CD to those components required to create a web-delivered application platform using the winning engine. Such an engine would, of course, have some features that dynamically encached expansions, memoizations, tablings and/or materialized views similar to the Hotspot optimization technology that originated with the Self programming language (and was later adopted by Sun's Java Virtual Machine). Hence it would make sense to have the OS CD contain a partially pre-expanded hence time-optimized code base.

Then, for delivery of software services to pre-existing platforms, create a legacy port of the services code to pre-existing W3C standards like XForms implemented in a downloadable ECMAScript Client/SOA library in a manner similar to the way TIBET(tm) does. The idea is to go "Live", ie: web-delivered, with a fundamentally new W3C base (whatever engine won the prize) but support legacy W3C environments for migration.

Again, this prize-oriented strategy would, of course, require a rigorous specification of the software services so the testing could be largely automated.

This approach addresses Microsoft's 2 biggest problems deriving from the same fundamental reality: Everyone has needed their OS to interoperate with the bulk of the information industry.

The first problem is ethical and really goes beyond the scope of my professional opinions to my public opinions about the support of property rights. Suffice to say, I have no trouble with someone who goes after a natural monopoly position and succeeds. I have a problem with someone who then refuses to use that position of success to fix the bug in the society that made them inordinately rich and their technology inordinately influential.

The second problem is technical, which is what my argument here is really all about.

Basically Microsoft's code bloat problem derives from its monopoly position. This may seem like a truism since all of the software "profession" suffers from code bloat, but only Microsoft can take this to monopolistic proportions -- proportions that make Ma Bell's monopolistic complexities of yore look Spartan.

So Microsoft has this problem and it has many programmers (contributing to the code-bloat problem). It also has mountains of cash.

So how can Microsoft bust its own monopoly position turning its many programmers and mountains of cash into succinct code?

Monetary Incentives for the Programmers, ala the Hutter Prize:

S = size of uncompressed code-base
P = size of program outputting the uncompressed code-base
R = S/P (the compression ratio).

Award monies in a manner similar to the M-Prize:

Previous record ratio: R0
New record ratio: R1=R0+X

Fund contains: $Z at the time of the new record
Winner receives: $Z * (X/(R0+X))

It may turn out that due the incomputability of Kolmogorov complexity, the growth of reward may need ultimatelyto go exponential but the principle remains true.

What happens very rapidly is the programmers first apply their skills to maximally refactoring. What falls out is a series of legacy API layers written atop a tight core.

They'd have to spend more money on code testing to verify the compressed code-bases of the competing teams actually worked to spec but the results should be quite gratifying.

Kolmogorov complexity not tractable - compression? by tucuxi · 2007-09-10 04:34 · Score: 1 · on Ultra-low-cost True Randomness

Indeed - Kolmogorov complexity is nice to play with, but can't be calculated.

A useful approximation is to use "compressed size". An ideal, lossless compressor would be readily calculating the kolmogorov complexity. For instance, in the 123456789012345678901234567890 sequence example, any self-respecting compressor such as Zip would create something like "1234567890 times 3", which is pretty close to the shortest program which generates the sequence.

Indeed, really-good compression is close to AI. To say the same thing in progressively shorter ways, you need to find deeper patterns. Check out this page relating AI to compression: the Hutter prize

Compliance vs Compression by Baldrson · 2007-08-31 06:55 · Score: 1 · on Algorithm Rates Trustworthiness of Wikipedia Pages

This algorithm is measuring compliance with the Wikipedia dispute processing norms -- not "trustworthiness". A better measure of "trustworthiness" of a passage is its consistency with the rest of the body of human knowledge -- which is most strictly measured by the degree to which it is not a special case within a compressed representation of that knowledge. This is the basis of the Hutter Prize for Lossless Compression of Human Knowledge. The Hutter Prize is currently using a 100M sample from Wikipedia as its corpus.

Hutter Prize rules by tepples · 2007-07-10 00:09 · Score: 1 · on Text Compressor 1% Away From AI Threshold

I heard the decompression binary is around 100.1MB....

Poor joke. The Hutter Prize rules include the size of the decompressor in the size of the entry. Decompressors may depend only on stock libc of Windows or GNU/Linux operating systems. In practice, they'll need to run on a net-disconnected machine with a fresh OS install.

Re:Program size is 1.02 MB! by Baldrson · 2007-07-09 18:58 · Score: 4, Informative · on Text Compressor 1% Away From AI Threshold

Actually, the size of the program (decompressor) binary is 99,696 bytes, and it is the binary size that is included in the prize calculation.

It may be too late for Microsoft now but... by Baldrson · 2007-05-13 04:41 · Score: 1 · on Rethinking the Linux Distribution?

A long time before MIX'07's announcement of Silverlight, I posted an approach I thought Microsoft should take to going "live" with their applications suite as software services. The approach still applies to others who might like to go "live" with software turned to "web" services. Translate from "Ray Ozzie" to "Linus", etc. and it applies to the present issue -- but with a big problem remaining of how to raise money for the prize.

Here's what I wrote back when there was still hope for Microsoft: