The markup language portions of the corpus will be very rapidly reduced to a set of rules governing their syntax leaving mainly their "raw" meaning (as meta knowledge in the case of the XML markup of the corpus and human interface knowledge in the case of the Wiki markup). Although there is good reason for these to be the subject of research for their own merit, researchers who want to win money via compression of more generally valuable knowledge will find lots of opportunities and needn't be distracted from primary research routes.
Ultimately, you can see the corpus as a space within which various hypotheses about practical AI can be experimentally compared. That's the crucial importance of choosing something general like Wikipedia.
The fact that there are local optimal in utility functions in which you can get stuck is a problem for all learning systems but to varying degrees depending on the details of how they look around the space of "programs". The space of programs measured for Kolmogorov complexity has a lot of discontinuity.
Hutter's theory is about an active AI agent seeking to maximize some utility function rather than passive prediction. However, it is probably the case that prizes for human knowledge compression should have been in place a long time ago -- probably as long ago as William of Ockham. IMHO guys like Solomonov provided sufficient formal rigor to justify massive government funding of a prize competition for compression of human knowledge decades ago, but it seems to be the case that people who control money generally just can't bring themselves to fund prize competitions for objective criteria. This generally doesn't speak well of people who control money. The exceptions are highly notable.
See the detailed rules for specifics but generally the rules are just what you would expect: The program runs (and completes in a reasonable time) on a relatively recent system running Windows (currently XP) or Linux with no external inputs, eg no dynamically loaded libraries not included in the submission, no net communication and no disk I/O that isn't generated by the program itself.
Points are not awarded for attempting to circumvent the intent of the competition. I expect such attempts would result in future submissions from the same source being ignored.
And indeed you can come up with all kinds of rules to help you predict the characters in a stream like Wikipedia, not just linguistic rules, but higher level rules are less likely to become crucial with the 100M contest and may need to wait for the 1G (or complete archive of Wikipedia) contest. I can imagine that certain rules relating to relational similarity (analogy) might come into play at the 100M level, particularly since there are now programs that can perform as well as college bound seniors on the SAT verbal analogies test (a heavily 'g' loaded test).
In it, Jaynes shows that an optimal decision maker shares this same tendency of reinforcing exiting belief systems. He even gives examples where new information reinforces the beliefs of optimal observers who have reached opposite conclusions (due to differing initial sets of data). Each observer believes the new data further supports their own view.
I think what Hutter has shown is that there is a solution which unifies the new data with the old within a new optimum, which is most likely unique. I think it is based on the idea that Kolmogorov complexity is a unique value for any string and is most likely represented by a single optimum program (the "self-extracting archive" of the string).
Wikipedia is a representation of accumulated human knowledge (experience) presented primarily in natural language. The smallest self-extracting archive will necessarily have rules that imply that knowledge and in such a way that the rules of natural language are represented as well.
The distinction between compressed experience and rules is an illusion. Rules _are_ compressed experience in the same sense that "x+y=z" is a compressed representation of the table of all ordered pairs (x,y) of numbers and their sum (z). If one has 2**32 distinct rows in such a table then one can physically represent it in a very small program. Even if one has only a small sample of the rows, one may still quite profitably compress those rows with the program.
That's another contest that is useless for the reason you cite.
The contest for the Hutter Prize requires the compressed corpus to be a self-extracting archive -- or failing that to add the size of the compressor to the compressed corpus.
This - in humans, at least - can lead to the cyclic reinforcement of one's belief system. The belief system that explains observations initially is used to filter observations later.
There is no allowance for lossy compression. The requirement of lossless compression is there for precisely the reason you state.
But in the real world the problem is knowing that one's observations are all - or even a significant percentage - of the possible observations.
This is precisely the assumption of Hutter's theory.
Chapter 2 of his book "Simplicity & Uncertainty" deals with this in more detail but the link provided does do an adequate job of stating:
The universal algorithmic agent AIXI. AIXI is a universal theory of sequential decision making akin to Solomonoff's celebrated universal theory of induction. Solomonoff derived an optimal way of predicting future data, given previous observations, provided the data is sampled from a computable probability distribution. AIXI extends this approach to an optimal decision making agent embedded in an unknown environment.
Stack computing came close to changing the course of the computer industry, including setting networking forward 15 years (displacing Microsoft's stand-alone approach to software) back in 1981.
In August 1980, Byte magazine published its issue on the Forth programming
language
At that time, I was working with Control Data Corporation's PLATO project,
pursuing a mass market version of that system using the Intelligent Student
Terminal (IST). The IST's were Z80 processor terminals sporting 512*512 bit
mapped displays with touch sensitive screens and 1200bps modems that went for
about $1500. We were shooting for, and actually successfully tested, a system
that could support almost 8,000 simultaneous users on 7600-derived Cybers (the
last machine designed by Seymour Cray to be marketed by CDC --with 60 bits per
word, 6 bits per character, no virtual memory, but very big and very fast) with
under 1/4 second response time (all keys and touch inputs went straight to the
central processor) for $40/month flat rate including terminal rental. Ray
Ozzie had been working at the University of Illinois on offloading the PLATO
central system to the Z80 terminal through downloaded assembly language programming,
doing exotic things like "local key echo" and such functions.
I was interested in extending Ray's work to offload the mass-market version
of the PLATO central system. In particular I was looking at a UCSD
Pascal-based approach to download p-code
versions of terminal functions -- and even more in particular the advanced scalable
vector graphics commands of TUTOR (the "relative/rotatable" commands
like rdraw, rat, rcircle, rcircleb, etc.) if not entire programs, to be executed
offline. Pascal was an attractive choice for us at the time because CDC's new
series of computers, the Cyber 180 (aka Cyber 800) was to have virtual memory,
64 bit words, 8 bit characters and be programmed in a version of the University
of Minnesota Pascal called CYBIL
(which stood for Cyber Implementation Language). Although this was a radically
different architecture than that upon which PLATO was then running, I thought
it worthwhile to investigate an architecture in which a reasonable language
(you should have seen what we were used to!) could be made to operate on both
the server and the terminal so that load could be dynamically redistributed.
This idea of dynamic load balancing would, later, contribute to the genesis
of Postscript.
Over one weekend a group of us junior programmers managed to implement a good
portion of TUTOR's (PLATO's authoring language) advanced graphics commands in
CYBIL. Our little hunting pack at CDC 's Arden Hills Operations was in a race
against the impending visit of Dave Anderson of the University of Illinois'
PLATO project who was promoting what he called "MicroTUTOR". Anderson
was going to take the TUTOR programming language and implement a modified version
of it for execution in the terminal -- possibly in a stand-alone mode. Many
of us didn't like TUTOR, itself, much. Indeed, I had to pull teeth to get the
authorization to put local variables into TUTOR -- and we were determined to select
a better board from our quiver with which to surf Moore's Shockwave into the
Network Revolution.
CDC management wasn't convinced that such a radical departure from TUTOR would
be wise, and we hoped to demonstrate that a p-code Pascal approach could accomplish
what microTUTOR purported to -- and more. We quickly ported a TUTOR central
sy
I visited James Van Allen at the University of Iowa physics building during the campaign to require government bureaucracies to purchase launch services from commercial vendors.
His support was crucial for the passage of the Launch Services Purchase Act of 1990 which, although largely resisted by NASA at the time became a bellweather for future launch service policy.
If the Cyc knowledge base actually models human "common sense" then the first thing Lenat should do is donate to the Hutter Prize for Lossless Compression of Human Knowledge or at least compete for the existing 50,000 euro prize.
The primary value of Silicon Valley was as a place where an engineer could buy a house and change jobs during the era when owning real estate was the primary economic behavior rewarded by government policy. During that era the cost of reproduction went up by nearly a factor of 4 unless you owned a house and did not have to relocate every few years.
If you lived outside Silicon Valley and were an engineer, you are probably in very bad shape now unless you were able to find one of those rare "cradle to grave" jobs that were so typical of the GI generation.
Real world compression of large text (gigabytes) files will come down to hundreds of megabytes. Virtually all real world emulators are megabytes or less, not hundreds of megabytes nor even tens of megabytes. The ratio isn't changed much in the real world.
If the use case set (the thing being compressed into a spec) isn't human readable, what good is it? The spec can be expanded in to render it readable by humans.
Rubic's cube isn't really a very good example of the kinds of problems people solve in the real world. It has too many "group symmetries". I like symmetry as much as the next guy -- believe me, more actually -- but be reasonable.
But even in those special instances where one has a Rubic's cube kind of problem, the proof that it is solvable in at most N steps is relevant only if the use to which the cube is being put requires such a small number of steps. If it does then that is the use case and you can't get around it. If there is no constraint on the number of steps (unlikely) then the sole criteria is the simplest specification of a universal solution.
First, the goal of the C-Prize isn't necessarily a compressor -- although that would be nice. The goal is an optimal compression of a relatively broad knowledge base. This optimal compression has its own value -- think about an ontology that optimally codes a broad range of knowledge and you can see the means of compression is really secondary to the value.
Second, if you followed the links you'll see that I did discuss using the multi-lingual aspect of Wikipedia to discover language independent properties of the knowledge base -- although this is likely to happen even without using multi-lingual samples from Wikipedia.
As for the relevance to web services -- I guess you consider it irrelevant that the new architect for Microsoft said "complexity kills" as part of his manifesto on reorienting Microsoft toward services.
It turns out that the choice of universal machine is not very important to the compression ratio. It is different by a constant -- which you can see by virtue of the fact that U1 can emulate universal machine U2 by a fixed sized program.
A time/space constraint on the C-Prize is a given due to the halting problem. So what?
My statement about the quality of knowledge being defined by its approximation to the Kolmogorov complexity of the world it represents holds.
A program specification is executable "in theory" (I did put it precisely that way for a reason). Programmers still have jobs to do and the closer they come to the Kolmogorov complexity of the specification within the computation constraints, the better quality their implementation.
PS: A.C. snipers are typically suffering from low self-esteem and go on about "trolls" mainly as a method of projecting their own problems with real discussions. How about just coming out of the shadows and providing your identity to show you aren't afraid what you say is horseshit?
Typically, the Anonymous Coward's contentless sarcasm betrays his shallow grasp of reality.
The relevance is clear: When you design your service suite and do not minimize complexity, you aren't just asking for trouble, you are, by definition, producing a low quality suite.
You can, in fact, produce a compression of a natural language knowledge base without even using a compression program and have that be an important human accomplishment. Epistemology is virtually defined by such advances. So the fact that the problem is computationally hard is neither here nor there to first order. The important thing is quality of knowledge.
Ultimately, you can see the corpus as a space within which various hypotheses about practical AI can be experimentally compared. That's the crucial importance of choosing something general like Wikipedia.
I made a mistake in stating the criteria. The size of the compressor is irrelevant. Only the size of the decompressor is measured.
The fact that there are local optimal in utility functions in which you can get stuck is a problem for all learning systems but to varying degrees depending on the details of how they look around the space of "programs". The space of programs measured for Kolmogorov complexity has a lot of discontinuity.
Hutter's theory is about an active AI agent seeking to maximize some utility function rather than passive prediction. However, it is probably the case that prizes for human knowledge compression should have been in place a long time ago -- probably as long ago as William of Ockham. IMHO guys like Solomonov provided sufficient formal rigor to justify massive government funding of a prize competition for compression of human knowledge decades ago, but it seems to be the case that people who control money generally just can't bring themselves to fund prize competitions for objective criteria. This generally doesn't speak well of people who control money. The exceptions are highly notable.
I hope the prize fund becomes very large and someone like you comes up with an algorithm and gets rich enough to retire.
Points are not awarded for attempting to circumvent the intent of the competition. I expect such attempts would result in future submissions from the same source being ignored.
And indeed you can come up with all kinds of rules to help you predict the characters in a stream like Wikipedia, not just linguistic rules, but higher level rules are less likely to become crucial with the 100M contest and may need to wait for the 1G (or complete archive of Wikipedia) contest. I can imagine that certain rules relating to relational similarity (analogy) might come into play at the 100M level, particularly since there are now programs that can perform as well as college bound seniors on the SAT verbal analogies test (a heavily 'g' loaded test).
I think what Hutter has shown is that there is a solution which unifies the new data with the old within a new optimum, which is most likely unique. I think it is based on the idea that Kolmogorov complexity is a unique value for any string and is most likely represented by a single optimum program (the "self-extracting archive" of the string).
The distinction between compressed experience and rules is an illusion. Rules _are_ compressed experience in the same sense that "x+y=z" is a compressed representation of the table of all ordered pairs (x,y) of numbers and their sum (z). If one has 2**32 distinct rows in such a table then one can physically represent it in a very small program. Even if one has only a small sample of the rows, one may still quite profitably compress those rows with the program.
The contest for the Hutter Prize requires the compressed corpus to be a self-extracting archive -- or failing that to add the size of the compressor to the compressed corpus.
There is no allowance for lossy compression. The requirement of lossless compression is there for precisely the reason you state.
This is precisely the assumption of Hutter's theory.
Chapter 2 of his book "Simplicity & Uncertainty" deals with this in more detail but the link provided does do an adequate job of stating:
An excerpt from a bit longer essay I wrote:
For more background see the Slashdot interview with Chuck Moore.
His support was crucial for the passage of the Launch Services Purchase Act of 1990 which, although largely resisted by NASA at the time became a bellweather for future launch service policy.
PS: I do regret not having mentioned Dr. Van Allen's support during my Congressional testimony.
See Matt Mahoney's description of Marcus Hutter's proof that compression is equivalent to general intelligence.
If you lived outside Silicon Valley and were an engineer, you are probably in very bad shape now unless you were able to find one of those rare "cradle to grave" jobs that were so typical of the GI generation.
But now, the bubble may be bursting finally.
No joy. Try gmail. I'm "jabowery".
FleaPlus, please contact me via email.
Real world compression of large text (gigabytes) files will come down to hundreds of megabytes. Virtually all real world emulators are megabytes or less, not hundreds of megabytes nor even tens of megabytes. The ratio isn't changed much in the real world.
Rubic's cube isn't really a very good example of the kinds of problems people solve in the real world. It has too many "group symmetries". I like symmetry as much as the next guy -- believe me, more actually -- but be reasonable.
But even in those special instances where one has a Rubic's cube kind of problem, the proof that it is solvable in at most N steps is relevant only if the use to which the cube is being put requires such a small number of steps. If it does then that is the use case and you can't get around it. If there is no constraint on the number of steps (unlikely) then the sole criteria is the simplest specification of a universal solution.
First, the goal of the C-Prize isn't necessarily a compressor -- although that would be nice. The goal is an optimal compression of a relatively broad knowledge base. This optimal compression has its own value -- think about an ontology that optimally codes a broad range of knowledge and you can see the means of compression is really secondary to the value.
Second, if you followed the links you'll see that I did discuss using the multi-lingual aspect of Wikipedia to discover language independent properties of the knowledge base -- although this is likely to happen even without using multi-lingual samples from Wikipedia.
As for the relevance to web services -- I guess you consider it irrelevant that the new architect for Microsoft said "complexity kills" as part of his manifesto on reorienting Microsoft toward services.
It turns out that the choice of universal machine is not very important to the compression ratio. It is different by a constant -- which you can see by virtue of the fact that U1 can emulate universal machine U2 by a fixed sized program.
A time/space constraint on the C-Prize is a given due to the halting problem. So what?
My statement about the quality of knowledge being defined by its approximation to the Kolmogorov complexity of the world it represents holds.
A program specification is executable "in theory" (I did put it precisely that way for a reason). Programmers still have jobs to do and the closer they come to the Kolmogorov complexity of the specification within the computation constraints, the better quality their implementation.
PS: A.C. snipers are typically suffering from low self-esteem and go on about "trolls" mainly as a method of projecting their own problems with real discussions. How about just coming out of the shadows and providing your identity to show you aren't afraid what you say is horseshit?
Typically, the Anonymous Coward's contentless sarcasm betrays his shallow grasp of reality. The relevance is clear: When you design your service suite and do not minimize complexity, you aren't just asking for trouble, you are, by definition, producing a low quality suite. You can, in fact, produce a compression of a natural language knowledge base without even using a compression program and have that be an important human accomplishment. Epistemology is virtually defined by such advances. So the fact that the problem is computationally hard is neither here nor there to first order. The important thing is quality of knowledge.