OpenCyc 1.0 Stutters Out of the Gates
moterizer writes "After some 20 years of work and five years behind schedule, OpenCyc 1.0 was finally released last month. Once touted on these pages as "Prepared to take Over World", the upstart arrived without the fanfare that many watchers had anticipated — its release wasn't even heralded with so much as an announcement on the OpenCyc news page. For those who don't recall: "OpenCyc is the open source version of the Cyc technology, the world's largest and most complete general knowledge base and commonsense reasoning engine." The Cyc ontology "contains hundreds of thousands of terms, along with millions of assertions relating the terms to each other, forming an upper ontology whose domain is all of human consensus reality." So are these the fledgling footsteps of an emerging AI, or just the babbling beginnings of a bloated database?"
Please, for the good of Humanity, vote Obama.
Leave Wikipedia out of this.
I'm sure "SlashdotMedia" will improve on all the wonders that Dice Holdings blessed us all with
...but does it know Linux?
Bragi Ragnarson Lawful Good (I change the law when it's not good)
I kind of feel bad for Cyc/OpenCyc... they've put so many years into this project, but using web-based games to collect and verify this common-sense data is much faster than using a few paid experts and can give much more data. For the curious, Luis von Ahn, a grad student (and now assistant professor) at Carnegie Mellon University gave a (rather entertaining) tech talk at Google about his work in this area.
He's recently been working on a project called Verbosity, which uses such games to collect the same sort of common-sense data that Cyc has been trying to collect all these years. Cyc's ontology apparently contains "hundreds of thousands of terms, along with millions of assertions relating the terms to each other." If Verbosity is as popular as von Ahn's ESP Game, the game could probably construct a better database in a matter of weeks.
Here's the abstract from a research paper on the topic:
Verbosity: a game for collecting common-sense facts
We address the problem of collecting a database of ""common-sense facts"" using a computer game. Informally, a common-sense fact is a true statement about the world that is known to most humans: ""milk is white,"" ""touching hot metal hurts,"" etc. Several efforts have been devoted to collecting common-sense knowledge for the purpose of making computer programs more intelligent. Such efforts, however, have not succeeded in amassing enough data because the manual process of entering these facts is tedious. We therefore introduce Verbosity, a novel interactive system in the form of an enjoyable game. People play Verbosity because it is fun, and as a side effect of them playing, we collect accurate common-sense knowledge. Verbosity is an example of a game that not only brings people together for leisure, but also collects useful data for computer science.
So are these the fledgling footsteps of an emerging AI, or just the babbling beginnings of a bloated database?
Cyc is a fledgling AI, depending on how you count "AI". Then again, so is my thermostat. My thermostat "knows" how to keep the room the right temperature. Cyc "knows" about a great deal of conventional human background, just like a database with a query system "knows" how to give you the data in that system.
The real question is not "is this AI", but rather, is it useful, and if so, to who? I think Cyc has the potential to be quite useful in some areas; we'll see how far it goes, and what the limitations are in time.
Right now, I think the real problem with Cyc is understanding it on a practical level, and getting an understanding of what it can do in practice, not in theory. When I last looked at the project nine years ago, they were just starting to open up things a bit, and it sounded like someone who understood the project might make great things happen. They don't seem to have yet; but who knows... perhaps in the future.
Now that OpenCyc is finally released, the most important steps to get people using it is to drop the learning curve down to a reasonable level, so that developers can start playing with it and find out what it can do without committing their lives to the project...
We'll have to see what happens: Cyc is a big (bloated?) database that's also a fledgling AI -- the real question is, what cool things can we make it DO? Time will tell...
Cyc has an ontology of general conceptual terms, and represents the precise logical way in which
those concepts interrelate. In other words, it emulates an aspect of the pure rational part of
human reasoning about the world.
But it's known that humans are not dispassionate rational agents. And indeed that there probably
is no such thing as a dispassionate rational agent. Commander Data and Spock are very ill-conceived
ideas of robot-like reasoners. Passion (emotion, affect) is the prioritizer of reasoning that allows
it to respond effectively (sometimes in real time) to the relevant aspects
of situations. Without the guidance of emotion, no common-sense reasoning engine would be powerful
enough, no matter how parallel it was, to process all of the ramifications of situations and
come up with relevant and useful and communicable and actionable conclusions.
So how do we give CYC passion? Or at least a simulation of it?
Well the key would seem to lie in measuring the level of human concern with each concept, and with
each type of situational relationship between pairs (and n-tuples) of concepts.
How could we do that? How about doing a latent semantic analysis from google search results. Something
similar to Google Trends, but which measures specifically the correlation strengths of pairs of
concepts (in human discourse, which Google indexes). The relative number of occurrences (and co-occurrences)
of concept terms in the web corpus should provide a concept weighting and a concept-relationship weighting.
If we then map that weighting on top of the CYC semantic network, we should have a nicely "concern"-weighted
common-sense knowledge base, which should be similar in some sense to a human's memory that supports
human-like comprehension of situations.
Combining a derivative of google search results with CYC is my suggestion for beginning to make an AI that can talk to
us in our terms, and understand our global stream of drivel.
I wish I had time to work on this.
Where are we going and why are we in a handbasket?
The joke will be on us when the first real AI wakes up, spends some time contemplating the Internet, downloading terabytes of information, and finally communicates with its creators...
...only to ask for more pr0n.