Domain: hutter1.net
Stories and comments across the archive that link to hutter1.net.
Comments · 34
-
Re:Of course it didn't work
> Information theory stablishes what is really possible
Actually, information theory states the opposite. Determining entropy of unknown source is an intractable problem, and you can't generally state amount of entropy for piece of data unless you're certain it's a quantum pink noise beforehand, all we know that the better the compressor, the closer you get. That's why one time pads use truly random codebooks, not a PRNG (PRNG has very little entropy - that of PRNG seed).
While extremely important as an output filter, just an entropy encoder doesn't compression make. -
Re:How does it sound?
Look at the codec diagram - if you ignore the entropy coder, it largely resembles input filters of voicerecog systems - before feeding the NN input terminals, signal is decimated to extremely low bandwidth vectors with only the psychoacoustic essentials of human voice - quantized to very few dominating tones and their attack/release values. The NN model does the final step of "compressing" the result only by factor of around 100 into text. It is popularly conjenctured that compression is, in fact, a ML problem.
Same is done with computer vision, before matching for features, the frequency space is filtered into a narrow band where the interesting stuff can be still observed. -
Artificial Intelligence-Based Education
I keep coming back to natural language compression prizes. The best hope we have of ameliorating human stupidity and ignorance is computer based education starting with a _neutral_ electronic genius with astronomical verbal intelligence. Verbal intelligence entails the ability to assess the verbal and cognitive character of your audience and modify your speech acts accordingly. The cost of electricity -- about 10 cents per kilowatt hour -- would be vastly lower than the cost of transferring benevolent _natural_ geniuses with high verbal intelligence into educational roles. Moreover, the exponential character of Moore's Law, combined with the history of bad general artificial intelligence theory that is finally giving way to mathematical rigor, offers an enormous potential for computer aided education in the near future -- if natural language compression is seen as the critical metric for "friendly AI" it is under such rigor. http://prize.hutter1.net/
-
Rigorous Criterion for AI Prize
Have you considered the utility of a compression-based AI prize for not only advancing machine learning, but also redressing information sabotage? Since Google DeepMind cofounder, Shane Legg, demonstrated the utility of a mathematically rigorous measure of problem-solving intelligence, which is based on Hutter's provably "optimal agent", Universal Algorithmic Intelligence, it seems time for an update of The Hutter Prize for Lossless Compression of Human Knowledge in two way tos: 1) a much larger knowledge base and 2) correspondingly much larger prize endowment. As such a prize pays only in proportion to rigorously measurable progress, and that progress is made public in the form of the refinement of knowledge, it would be a low risk public good appropriate for public sector as well as NGO endowment.
-
Expand the Hutter Prize
Expand the Hutter Prize for Lossless Compression of Human Knowledge to include the entire edit history of Wikipedia as well as the entirety of Wikipedia's current contents.
Why?
Because it solves the artificial intelligence problem and does so in a way that optimally enables natural language communication of the accumulated knowledge of humanity.
What I mean by "optimally enables natural language communication" is what every professional writer uses as the first rule of composition:
Write to your audience.
In other words, let's say you are attempting to write an article about quantum mechanics and your audience is a 12 year old from New Jersey, raised without a father in an impoverished, crime-ridden neighborhood. This is a very different composition task than communicating quantum mechanics to a college educated liberal arts graduate from Iowa who is considering a career in accounting. Indeed, it is the essence of pedagogy -- universalized.
By including the entire edit history of Wikipedia, the worldviews, perspectives, biases and agendas of a large number of editors will provide insight into the cognitive as well as social structure of a wide array of humans.
Moreover, while Google and companies like it are increasingly casting their role as "publishers" with the "right" to "editorialize" their search results, the Hutter Prize has a mathematical objective function that is simply not subject to editorialization: Kolmogorov Complexity. KC is a rigorous definition of Ockham's Razor that is mathematically sound and provably an optimal measure of mastery of knowledge.
-
System Development Foundation
Its "System Development Foundation" not "System Development Corporation" and Charlie's full name is Charles Sinclair Smith. He's semi-retired now and living the next county over from me in southeast Iowa where we've been collaborating on a couple of projects -- one of which is to photosynthesize all of the CO2 effluent from US fossil fuel power plants (as Charlie got his start co-founding the Energy Information Administration of the DoE under Carter).
Its ironic that in the 80s I was living in La Jolla, which was an epicenter of the neural net revival at UCSD, had taken neural net courses from Robert Hecht-Nielsen and by 1990 had prototyped the highest performance neural network image processing system (as Neural Engines Corporation) -- but I then later worked with Charlie for almost 15 years before discovering he had had played such a key role in the revival of neural nets. Even more ironic is that, circa 2005, I came up with the idea for the Hutter Prize for Lossless Compression of Human Knowledge -- based on Hutter's entirely different, top down mathematics approach to AI -- and Shane Legg, founder of Deep Mind, which is largely identified with deep learning neural nets, actuality studied under Hutter and achieved Deep Mind's famous ability to learn to play video games using Hutter's approach but everyone thinks that capability is uniquely attributable to deep neural net learning alone.
-
System Development Foundation
Its "System Development Foundation" not "System Development Corporation" and Charlie's full name is Charles Sinclair Smith. He's semi-retired now and living the next county over from me in southeast Iowa where we've been collaborating on a couple of projects -- one of which is to photosynthesize all of the CO2 effluent from US fossil fuel power plants (as Charlie got his start co-founding the Energy Information Administration of the DoE under Carter).
Its ironic that in the 80s I was living in La Jolla, which was an epicenter of the neural net revival at UCSD, had taken neural net courses from Robert Hecht-Nielsen and by 1990 had prototyped the highest performance neural network image processing system (as Neural Engines Corporation) -- but I then later worked with Charlie for almost 15 years before discovering he had had played such a key role in the revival of neural nets. Even more ironic is that, circa 2005, I came up with the idea for the Hutter Prize for Lossless Compression of Human Knowledge -- based on Hutter's entirely different, top down mathematics approach to AI -- and Shane Legg, founder of Deep Mind, which is largely identified with deep learning neural nets, actuality studied under Hutter and achieved Deep Mind's famous ability to learn to play video games using Hutter's approach but everyone thinks that capability is uniquely attributable to deep neural net learning alone.
-
Deep Mind's IQ Test Works
A rigorous definition of general intelligence now exists and has been applied by the Deep Mind folks. See this video lecture by Deep Mind's Shane Legg at Singularity Summit 2010 on a new metric for measuring machine intelligence.
If you want something more accessible to the general public, The Hutter Prize for Lossless Compression of Human Knowledge has the same theoretic basis as the test used by Deep Mind and has the virtue that it uses a natural language criterion, in the form of a Wikipedia snapshot. If the 100M snapshot of Wikipedia used by the Hutter Prize is no longer challenging enough, then substitute Matt Mahoney's Large Text Compression Benchmark which is basically just the Hutter Prize enlarged by an order of magnitude.
-
Re:I can answer that, Alex!Okian Warrior asserts: "There is no formal definition of intelligence, and no roadmap for what to study"
Yes there is. It's defined by a field called "Universal Artificial Intelligence" and the roadmap says what to study.
-
Its not winning the Hutter PrizeThe claim that "winning both industrial and academic data competitions with minimal effort" might be more impressive if it included the only provably rigorous test of general intelligence:
The Hutter Prize for Lossless Compression of Human Knowledge
The last time anyone improved on that benchmark was 2009.
-
The Hutter Prize
The Hutter Prize for Lossless Compression of Human Knowledge, if sufficiently funded, will get adolescent competitive hormones kicking in to teach them about programming, artificial intelligence and the nature of knowledge itself.
-
Re:Show me the runny
No, he knows and has explicitly stated in a few places that it's uncomputable, in much the same way that Kolmogorov Complexity is uncomputable, but an interesting and potentially useful theoretical construct, nonetheless.
This vein of Schmidhüber's work is more or less descended from Solomonoff's work on induction and Chaitin's Algorithmic Information Theory stuff (the line of descent is less explicit with the latter), and a bunch of Schmidhüber's descendents, most prominently his student Marcus Hutter and *his* student Shane Legg have taken this ball and run with it in interesting ways.
-
Re:Try the Hutter Prize modelThe point of the Turing test is to model human intelligence. That is not the point of the Hutter Prize. The point of the Hutter Prize is to model optimal intelligence. Human intelligence is not optimal. The target of this intelligence is chosen as human knowledge as represented in Wikipedia. Optimal, or universal, intelligence is a field of pure mathematics: The goal is to mathematically define a unique model superior to any other model in any environment. From a presentation by Marcus Hutter:
- The (optimal) AI model is unique in the sense that it has no parameters which could be adjusted to the actual environment in which it is used.
- In this first step toward a universal theory we are not interested in computational aspects.
- Nevertheless, we are interested in maximizing a utility function, which means to learn in as minimal number of cycles as possible. The interaction cycle is the basic unit, not the computation time per unit.
Some confusion arises due to the fact that optimal compression algorithms rely on optimal prior knowledge of the nature of the input. Optimally compressed prior knowledge is a better "ontology" for predicting, hence compressing, further information coming in from the same environment.
Universal intelligence is not computable, although there is an order 2^N approximation.
-
Kernel Compression Prize CompetitionSet up a prize competition for kernel compression similar to the Hutter Prize for Lossless Compression of Human Knowledge except the objective is the produce an executable binary of minimum size that expands into a fully functional kernel.
The goal of this competition would be to obtain the optimal factoring of the kernel architecture.
-
Try the Hutter Prize model
The Hutter Prize's incremental prize awards for progress, itself modeled on the M-Prize, is a superior way of awarding prize money. There is continual reward for teams that contribute substantially and no one team takes everything based on a technicality.
-
Re:He's too close.
Well put, and I agree. I would add that what we actually want out of artificial systems is some kind of combination of survivability and intelligence, and we don't want to go too far in either direction.
"Too much survivability" would be where we can't shut the system down when it's not doing what we want it to, or being destructive. Too little survivability would be where the resources to keep it going exceed the benefit of the output it gives us.
Now, how can you get too much intelligence? Well, if you take intelligence to mean "extracting the most knowledge from the least data", then an optimally intelligent system would be the one that updates its "probability distribution" over the world exactly as its limited observations suggest. However, this would needlessly discard all of the knowledge we already have embedded in our bodies as a result of our long evolutionary history. Many things that we do to survive rely on such implicit knowledge.
In other words, we make good guesses that can't be justified based on what we consciously know, but "happen" to be right for this planet and this universe -- the very things a merely "intelligent" system would try to avoid. An example of a superintelligent system is Marcus Hutter's AIXI, which makes provably optimal inferences, but which takes way too long to do anything useful, because it has to re-learn everything starting from nothing but Occam's Razor.
-
Anyone for a General AI prize?Matt Mahoney to Hutter show details 9:33 AM (7 hours ago)
I have uploaded a mirror of Alexander Ratushnyak's new submission to the Hutter prize to http://cs.fit.edu/~mmahoney/compression/text.html#1323 It is in the paq8hp12 section. Scroll down to the bottom of the list of versions just above the table. The submission is decomp8.zip which contains 2 files, decomp8.exe and archive8.bin, the decompressor and compressed file. There is no compressor. To decompress:
decomp8 archive8.bin enwik8
The direct link is http://cs.fit.edu/~mmahoney/compression/decomp8.zip Decompression took about 2 hours on my computer and used a little over 924 MB memory. The total size of the 2 files is 15,986,677 which passes the 3% threshold improvement from his previous submission of 16,481,655 bytes on May 14, 2007.
The submission was Mar. 23. The 30 day comment period before awarding the prize ends Apr. 22, 2009.
-
Kolmogorov ProgrammingIf I were in Ray Ozzie's shoes I would apply something like the The Hutter Prize for Lossless Compression of Human Knowledge to the entirety of MS's software suite. This, of course, requires making a rigorous spec for testing purposes.
Make the engine, upon which the winning succinct byte code runs, a new W3C standard browser programming language (or at least virtual machine) and reduce the Microsoft OS CD to those components required to create a web-delivered application platform using the winning engine. Such an engine would, of course, have some features that dynamically encached expansions (and/or "memoizations") similar to the Hotspot optimization technology that originated with the Self programming language (and was later adopted by Sun's Java Virtual Machine). Hence it would make sense to have the OS CD contain a partially pre-expanded/optimized code base.
Then, for delivery of software services to pre-existing platforms, create a legacy port of the services code to pre-existing W3C standards like XForms implemented in a downloadable ECMAScript Client/SOA library in a manner similar to the way TIBET(tm) does. The idea is to go "Live", ie: web-delivered, with a fundamentally new W3C base (whatever engine won the prize) but support legacy W3C environments for migration.
Again, this prize-oriented strategy would, of course, require a rigorous specification of the software services so the testing could be largely automated.
This approach addresses Microsoft's 2 biggest problems deriving from the same fundamental reality: Everyone has needed their OS to interoperate with the bulk of the information industry.
The first problem is ethical and really goes beyond the scope of my professional opinions to my public opinions about the support of property rights. Suffice to say, I have no trouble with someone who goes after a natural monopoly position and succeeds. I have a problem with someone who then refuses to use that position of success to fix the bug in the society that made them inordinately rich and their technology inordinately influential.
The second problem is technical, which is what my argument here is really all about.
Basically Microsoft's code bloat problem derives from its monopoly position. This may seem like a truism since all of the software "profession" suffers from code bloat, but only Microsoft can take this to monopolistic proportions -- proportions that make Ma Bell's monopolistic complexities of yore look Spartan.
So Microsoft has this problem and it has many programmers (contributing to the code-bloat problem). It also has mountains of cash.
So how can Microsoft bust its own monopoly position turning its many programmers (many newly laid off!) and mountains of cash into succinct code?
Monetary Incentives for the Programmers. For example, the original idea for the Hutter Prize was:
S = size of uncompressed corpus
P = size of program outputting the uncompressed corpus
R = S/P (the compression ratio).Award monies in a manner similar to the M-Prize:
Previous record ratio: R0
New record ratio: R1=R0+XFund contains: $Z at the time of the new record
Winner receives: $Z * (X/(R0+X))Something similar can be done with the size of the binary that passes the entire suite of tests for Microsoft's software suite.
What happens very rapidly is the programmers first apply their skills to maximally refactoring. What falls out is a series of legacy API layers written atop a tight core.
They'd have to spend more money on code testing to verify the compressed code-bases of the competing teams actually worked to spec but the results should be quite gratifying.
-
Microsoft's ProblemIf I were in Ray Ozzie's shoes I would apply something like the The Hutter Prize for Lossless Compression of Human Knowledge to the entirety of MS's software services suite. This, of course, requires making a rigorous spec for testing purposes.
Make the engine, upon which the winning succinct byte code runs, a new W3C standard browser programming language (or at least virtual machine) and reduce the Microsoft OS CD to those components required to create a web-delivered application platform using the winning engine. Such an engine would, of course, have some features that dynamically encached expansions, memoizations, tablings and/or materialized views similar to the Hotspot optimization technology that originated with the Self programming language (and was later adopted by Sun's Java Virtual Machine). Hence it would make sense to have the OS CD contain a partially pre-expanded hence time-optimized code base.
Then, for delivery of software services to pre-existing platforms, create a legacy port of the services code to pre-existing W3C standards like XForms implemented in a downloadable ECMAScript Client/SOA library in a manner similar to the way TIBET(tm) does. The idea is to go "Live", ie: web-delivered, with a fundamentally new W3C base (whatever engine won the prize) but support legacy W3C environments for migration.
Again, this prize-oriented strategy would, of course, require a rigorous specification of the software services so the testing could be largely automated.
This approach addresses Microsoft's 2 biggest problems deriving from the same fundamental reality: Everyone has needed their OS to interoperate with the bulk of the information industry.
The first problem is ethical and really goes beyond the scope of my professional opinions to my public opinions about the support of property rights. Suffice to say, I have no trouble with someone who goes after a natural monopoly position and succeeds. I have a problem with someone who then refuses to use that position of success to fix the bug in the society that made them inordinately rich and their technology inordinately influential.
The second problem is technical, which is what my argument here is really all about.
Basically Microsoft's code bloat problem derives from its monopoly position. This may seem like a truism since all of the software "profession" suffers from code bloat, but only Microsoft can take this to monopolistic proportions -- proportions that make Ma Bell's monopolistic complexities of yore look Spartan.
So Microsoft has this problem and it has many programmers (contributing to the code-bloat problem). It also has mountains of cash.
So how can Microsoft bust its own monopoly position turning its many programmers and mountains of cash into succinct code?
Monetary Incentives for the Programmers, ala the Hutter Prize:
S = size of uncompressed code-base
P = size of program outputting the uncompressed code-base
R = S/P (the compression ratio).Award monies in a manner similar to the M-Prize:
Previous record ratio: R0
New record ratio: R1=R0+X
Fund contains: $Z at the time of the new record
Winner receives: $Z * (X/(R0+X))It may turn out that due the incomputability of Kolmogorov complexity, the growth of reward may need ultimatelyto go exponential but the principle remains true.
What happens very rapidly is the programmers first apply their skills to maximally refactoring. What falls out is a series of legacy API layers written atop a tight core.
They'd have to spend more money on code testing to verify the compressed code-bases of the competing teams actually worked to spec but the results should be quite gratifying.
-
Kolmogorov complexity not tractable - compression?Indeed - Kolmogorov complexity is nice to play with, but can't be calculated.
A useful approximation is to use "compressed size". An ideal, lossless compressor would be readily calculating the kolmogorov complexity. For instance, in the 123456789012345678901234567890 sequence example, any self-respecting compressor such as Zip would create something like "1234567890 times 3", which is pretty close to the shortest program which generates the sequence.
Indeed, really-good compression is close to AI. To say the same thing in progressively shorter ways, you need to find deeper patterns. Check out this page relating AI to compression: the Hutter prize
-
Compliance vs Compression
This algorithm is measuring compliance with the Wikipedia dispute processing norms -- not "trustworthiness". A better measure of "trustworthiness" of a passage is its consistency with the rest of the body of human knowledge -- which is most strictly measured by the degree to which it is not a special case within a compressed representation of that knowledge. This is the basis of the Hutter Prize for Lossless Compression of Human Knowledge. The Hutter Prize is currently using a 100M sample from Wikipedia as its corpus.
-
Hutter Prize rulesI heard the decompression binary is around 100.1MB....
Poor joke. The Hutter Prize rules include the size of the decompressor in the size of the entry. Decompressors may depend only on stock libc of Windows or GNU/Linux operating systems. In practice, they'll need to run on a net-disconnected machine with a fresh OS install.
-
Re:Program size is 1.02 MB!
Actually, the size of the program (decompressor) binary is 99,696 bytes, and it is the binary size that is included in the prize calculation.
-
It may be too late for Microsoft now but...A long time before MIX'07's announcement of Silverlight, I posted an approach I thought Microsoft should take to going "live" with their applications suite as software services. The approach still applies to others who might like to go "live" with software turned to "web" services. Translate from "Ray Ozzie" to "Linus", etc. and it applies to the present issue -- but with a big problem remaining of how to raise money for the prize.
Here's what I wrote back when there was still hope for Microsoft:
If I were in Ray Ozzie's shoes I would apply something like the The Hutter Prize for Lossless Compression of Human Knowledge to the entirety of MS's software services suite. This, of course, requires making a rigorous spec for testing purposes.
Make the engine, upon which the winning succinct byte code runs, a new W3C standard browser programming language (or at least virtual machine) and reduce the Microsoft OS CD to those components required to create a web-delivered application platform using the winning engine. Such an engine would, of course, have some features that dynamically encached expansions (and/or "memoizations") similar to the Hotspot optimization technology that originated with the Self programming language (and was later adopted by Sun's Java Virtual Machine). Hence it would make sense to have the OS CD contain a partially pre-expanded/optimized code base.
Then, for delivery of software services to pre-existing platforms, create a legacy port of the services code to pre-existing W3C standards like XForms implemented in a downloadable ECMAScript Client/SOA library in a manner similar to the way TIBET(tm) does. The idea is to go "Live", ie: web-delivered, with a fundamentally new W3C base (whatever engine won the prize) but support legacy W3C environments for migration.
Again, this prize-oriented strategy would, of course, require a rigorous specification of the software services so the testing could be largely automated.
This approach addresses Microsoft's 2 biggest problems deriving from the same fundamental reality: Everyone has needed their OS to interoperate with the bulk of the information industry.
The first problem is ethical and really goes beyond the scope of my professional opinions to my public opinions about the support of property rights. Suffice to say, I have no trouble with someone who goes after a natural monopoly position and succeeds. I have a problem with someone who then refuses to use that position of success to fix the bug in the society that made them inordinately rich and their technology inordinately influential.
The second problem is technical, which is what my argument here is really all about.
Basically Microsoft's code bloat problem derives from its monopoly position. This may seem like a truism since all of the software "profession" suffers from code bloat, but only Microsoft can take this to monopolistic proportions -- proportions that make Ma Bell's monopolistic complexities of yore look Spartan.
So Microsoft has this problem and it has many programmers (contributing to the code-bloat problem). It also has mountains of cash.
So how can Microsoft bust its own monopoly position turning its many programmers and mountains of cash into succinct code?
Monetary Incentives for the Programmers, ala the Hutter Prize:
S = size of uncompressed code-base
P = size of program outputting the uncompressed code-base
R = S/P (the compression ratio).Award monies in a manner similar to the M-Prize:
Previous record ratio: R0
New record ratio: R1=R0+XFund contains: $Z at the time of the new record
Winner receives: $Z * (X/(R0+X))What happens very rapidly is the programmers first apply their skills to maximally refactoring
-
Universes and Universal Turing MachinesAn hypothesized (meta)algorithm running our universe has been proposed in "The New AI: General & Sound & Relevant for Physics" by Jürgen Schmidhuber of Dalle Molle Institute for Artificial Intelligence:
"Systematically create and execute all programs for a universal computer, such as a Turing machine or a CA; the first program is run for one instruction every second step on average, the next for one instruction every second of the remaining steps on average, and so on."
This actually computes all universes -- not just ours. It also computes what might be thought of as nested universes, giving rise to the idea promoted by Smolin that some universes might be more prolific than others. Among the consequences of this hypothesis is:"Large scale quantum computation will not work well, essentially because it would require too many exponentially growing computational resources in interfering 'parallel universes'".
Prof. Schmidhuber's post-doc student, Marcus Hutter, of Hutter Prize for Lossless Compression of Human Knowledge fame came up with some of the key breakthroughs in "The New AI" upon which Schmidhuber's hypothesis is based. -
Hutter PrizeAs has been previously reported on slashdot, The Hutter Prize for Lossless Compression of Human Knowledge uses a snapshot of Wikipedia for rigorously benchmarking AI (and it has already had it's first payout).
The rigor of the benchmark is the key. The Turing Test really only benchmarks human mimicry -- not intelligence per se. The new theoretic basis of universal intelligence allows a mathematically rigorous approach to AI that is reviving the field after nearly 50 years of drifting in a stagnant pool of inadequate concepts.
-
For all the people laughing at this contest
-
Re:Done on 9/25/06?
>According to http://prize.hutter1.net/ this happened on Sept 25 of 2006.
There is waiting period for public comment/verification etc... -
Done on 9/25/06?
According to http://prize.hutter1.net/ this happened on Sept 25 of 2006.
The site also gives some of the requirements..
Create a compressed version (self-extracting archive) of the 100MB file enwik8 of less than 17MB. More precisely:
* Create a Linux or Windows executable of size S L := 17'073'018 = previous record.
* If run, it produces (without input from other sources) a 108 byte file that is identical to enwik8.
* If we can verify your claim, you are eligible for a prize of 50'000×(1-S/L). Minimum claim is 500.
* Restrictions: Must run in 10 hours on a 2GHz P4 with 1GB RAM and 10GB free HD. -
Microsoft's ProblemIf I were in Ray Ozzie's shoes I would apply something like the The Hutter Prize for Lossless Compression of Human Knowledge to the entirety of MS's software services suite. This, of course, requires making a rigorous spec for testing purposes.
Make the engine, upon which the winning succinct byte code runs, a new W3C standard browser programming language (or at least virtual machine) and reduce the Microsoft OS CD to those components required to create a web-delivered application platform using the winning engine. Such an engine would, of course, have some features that dynamically encached expansions (and/or "memoizations") similar to the Hotspot optimization technology that originated with the Self programming language (and was later adopted by Sun's Java Virtual Machine). Hence it would make sense to have the OS CD contain a partially pre-expanded/optimized code base.
Then, for delivery of software services to pre-existing platforms, create a legacy port of the services code to pre-existing W3C standards like XForms implemented in a downloadable ECMAScript Client/SOA library in a manner similar to the way TIBET(tm) does. The idea is to go "Live", ie: web-delivered, with a fundamentally new W3C base (whatever engine won the prize) but support legacy W3C environments for migration.
Again, this prize-oriented strategy would, of course, require a rigorous specification of the software services so the testing could be largely automated.
This approach addresses Microsoft's 2 biggest problems deriving from the same fundamental reality: Everyone has needed their OS to interoperate with the bulk of the information industry.
The first problem is ethical and really goes beyond the scope of my professional opinions to my public opinions about the support of property rights. Suffice to say, I have no trouble with someone who goes after a natural monopoly position and succeeds. I have a problem with someone who then refuses to use that position of success to fix the bug in the society that made them inordinately rich and their technology inordinately influential.
The second problem is technical, which is what my argument here is really all about.
Basically Microsoft's code bloat problem derives from its monopoly position. This may seem like a truism since all of the software "profession" suffers from code bloat, but only Microsoft can take this to monopolistic proportions -- proportions that make Ma Bell's monopolistic complexities of yore look Spartan.
So Microsoft has this problem and it has many programmers (contributing to the code-bloat problem). It also has mountains of cash.
So how can Microsoft bust its own monopoly position turning its many programmers and mountains of cash into succinct code?
Monetary Incentives for the Programmers, ala the Hutter Prize:
S = size of uncompressed code-base
P = size of program outputting the uncompressed code-base
R = S/P (the compression ratio).Award monies in a manner similar to the M-Prize:
Previous record ratio: R0
New record ratio: R1=R0+X
Fund contains: $Z at the time of the new record
Winner receives: $Z * (X/(R0+X))What happens very rapidly is the programmers first apply their skills to maximally refactoring. What falls out is a series of legacy API layers written atop a tight core.
They'd have to spend more money on code testing to verify the compressed code-bases of the competing teams actually worked to spec but the results should be quite gratifying.
-
Comparison of Wordnet to Current Hutter PrizeI suspect database mining algorithms for Wikipedia Neologisms could also help refactor Wordnet's definitions to be more succinct and hence provide a better basis for modeling other natural language corpora.
paq8hp3 is the current Hutter Prize lead contender and has compressed the first 100M of Wikipedia to just over 17M. Wordnet's
.exe file is just over 17M. One wonders what would happen if the "cream" of Wordnet's vocabulary were compressed using paq8hp3 and then incorporated into paq8hp3 to be a better compressor by inferring what words are more likely than others to appear near various combinations of words. You wouldn't have to go very deep to generate a large temporary file of word associations. Identifying the "cream" of Wordnet would be more than just frequency of usage. Some refactorization of the definitions may be in order to find which words are most powerful descriptors of other words. How much of that sort of work has been done? -
Natural LanguageThere's a good chance for natural language interfaces to computers given recent theoretic and practical breakthroughs.
Until recently there was no rigorous metric for the power of a natural language understanding system but that has changed with The Hutter Prize for Lossless Compression of Human Knowledge. Since the introduction of the Hutter Prize here at Slashdot there has already been as much progress as ordinarily occurs in a year (actually a bit more since an average year progresses 3% in compression of natural language and the current contestants may have already achieved 4% improvement since the
/. announcement).The theory is simple enough and the mathematical proof has been done: If you can sufficiently compress a large, general knowledge natural langage corpus like Wikipedia, you can competently articulate and understand natural language.
It's a hard problem but with the metric and the prize competition driving progress there's a good chance human-level understanding of natural language will start to emerge within the next few years.
BTW: This revolutionizes software development in more ways than one. Think about it like this: When Alan Kay first dreamt of Smalltalk, he was dreaming of a system anyone could program. Well, if you can just say what you want and the system is good enough at comprehending you, program specification just became very natural -- natural enough that you child could perform feats of programming not practical with corporate teams of software developers before.
-
Natural LanguageThere's a good chance for natural language interfaces to computers given recent theoretic and practical breakthroughs.
Until recently there was no rigorous metric for the power of a natural language understanding system but that has changed with The Hutter Prize for Lossless Compression of Human Knowledge. Since the introduction of the Hutter Prize here at Slashdot there has already been as much progress as ordinarily occurs in a year (actually a bit more since an average year progresses 3% in compression of natural language and the current contestants may have already achieved 4% improvement since the
/. announcement).The theory is simple enough and the mathematical proof has been done: If you can sufficiently compress a large, general knowledge natural langage corpus like Wikipedia, you can competently articulate and understand natural language.
It's a hard problem but with the metric and the prize competition driving progress there's a good chance human-level understanding of natural language will start to emerge within the next few years.
BTW: This revolutionizes software development in more ways than one. Think about it like this: When Alan Kay first dreamt of Smalltalk, he was dreaming of a system anyone could program. Well, if you can just say what you want and the system is good enough at comprehending you, program specification just became very natural -- natural enough that you child could perform feats of programming not practical with corporate teams of software developers before.
-
Lenate should fund the Hutter PrizeIf the Cyc knowledge base actually models human "common sense" then the first thing Lenat should do is donate to the Hutter Prize for Lossless Compression of Human Knowledge or at least compete for the existing 50,000 euro prize.
See Matt Mahoney's description of Marcus Hutter's proof that compression is equivalent to general intelligence.