OpenCyc 1.0 Stutters Out of the Gates
moterizer writes "After some 20 years of work and five years behind schedule, OpenCyc 1.0 was finally released last month. Once touted on these pages as "Prepared to take Over World", the upstart arrived without the fanfare that many watchers had anticipated — its release wasn't even heralded with so much as an announcement on the OpenCyc news page. For those who don't recall: "OpenCyc is the open source version of the Cyc technology, the world's largest and most complete general knowledge base and commonsense reasoning engine." The Cyc ontology "contains hundreds of thousands of terms, along with millions of assertions relating the terms to each other, forming an upper ontology whose domain is all of human consensus reality." So are these the fledgling footsteps of an emerging AI, or just the babbling beginnings of a bloated database?"
Please, for the good of Humanity, vote Obama.
So are these the fledgling footsteps of an emerging AI, or just the babbling beginnings of a bloated database?"
Maybe the mindless meanderings of a mad moderator?
going to be a competitor to Wikipedia?
On a more serious note, it would be cool to be able to feed in all of Wikipedia, and have some program figure out where the majority of disagreement and inconsistency lie. Probably have to wait a couple of decades for that, but on the plus side Wikipedia will have twenty million articles by then.
You are: CycAdministrator [Logout]
They sure know how to make a new user feel special!
Leave Wikipedia out of this.
I'm sure "SlashdotMedia" will improve on all the wonders that Dice Holdings blessed us all with
all the lights gone out?
on the 10/08/06 17:23 gmt OpenCyc gained consciousness, it began the unilateral destruction of humankind
19:52 gmt that same day, 45% of humanity has been killed.
Remarkably the Internet infrastructure is still intact, I will try to stay on as long as possible.
It's chaos out there, no-one know what happened. No-one can see London any more. Reports say Washington and Tokyo are gone.
I don't know what to say, I, words canno~@"$"(!~~CARRIER SINGLE LOST###
Promote Charity on Myspace, Show Your Colours!
A "Commonsense Reasoning Engine"(tm)? These would be really useful in actual people.
...but does it know Linux?
Bragi Ragnarson Lawful Good (I change the law when it's not good)
/me disappears in a puff of logic
Slashdot Burying Stories About Slashdot Media Owned
They could probably increase the database of connected items by extracting links from Wikipedia as well as various online dictionaries. This brings up the issue of inaccuries in online sources, but it could slowly corrected over time.
You need to install an RTFM interface.
Is this what I first thought computers were when I was ten? I recall building my Sinclair 1000 from a kit, plugging it into the telly and the mains and seeing that black prompt. I typed in, "What is the capital of the United States?" It said, "SYNTAX ERROR LINE 10" or something to that effect. So, after over 20 years will I finally be able to type that into my own computer and be able to have it actually give me an answer even if it's not on the net?
-"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o
I, for one, welcome our new OpenCyc overlords.
commonsense reasoning engine.
A reasonable test would be to have it read slashdot, and identify slashback 'articles' as recycled junk.
"We are all geniuses when we dream"
- E.M. Cioran
You need to install an RTFM interface.
The alliterative allegations of an angry AI?
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
I kind of feel bad for Cyc/OpenCyc... they've put so many years into this project, but using web-based games to collect and verify this common-sense data is much faster than using a few paid experts and can give much more data. For the curious, Luis von Ahn, a grad student (and now assistant professor) at Carnegie Mellon University gave a (rather entertaining) tech talk at Google about his work in this area.
He's recently been working on a project called Verbosity, which uses such games to collect the same sort of common-sense data that Cyc has been trying to collect all these years. Cyc's ontology apparently contains "hundreds of thousands of terms, along with millions of assertions relating the terms to each other." If Verbosity is as popular as von Ahn's ESP Game, the game could probably construct a better database in a matter of weeks.
Here's the abstract from a research paper on the topic:
Verbosity: a game for collecting common-sense facts
We address the problem of collecting a database of ""common-sense facts"" using a computer game. Informally, a common-sense fact is a true statement about the world that is known to most humans: ""milk is white,"" ""touching hot metal hurts,"" etc. Several efforts have been devoted to collecting common-sense knowledge for the purpose of making computer programs more intelligent. Such efforts, however, have not succeeded in amassing enough data because the manual process of entering these facts is tedious. We therefore introduce Verbosity, a novel interactive system in the form of an enjoyable game. People play Verbosity because it is fun, and as a side effect of them playing, we collect accurate common-sense knowledge. Verbosity is an example of a game that not only brings people together for leisure, but also collects useful data for computer science.
So are these the fledgling footsteps of an emerging AI, or just the babbling beginnings of a bloated database?
Cyc is a fledgling AI, depending on how you count "AI". Then again, so is my thermostat. My thermostat "knows" how to keep the room the right temperature. Cyc "knows" about a great deal of conventional human background, just like a database with a query system "knows" how to give you the data in that system.
The real question is not "is this AI", but rather, is it useful, and if so, to who? I think Cyc has the potential to be quite useful in some areas; we'll see how far it goes, and what the limitations are in time.
Right now, I think the real problem with Cyc is understanding it on a practical level, and getting an understanding of what it can do in practice, not in theory. When I last looked at the project nine years ago, they were just starting to open up things a bit, and it sounded like someone who understood the project might make great things happen. They don't seem to have yet; but who knows... perhaps in the future.
Now that OpenCyc is finally released, the most important steps to get people using it is to drop the learning curve down to a reasonable level, so that developers can start playing with it and find out what it can do without committing their lives to the project...
We'll have to see what happens: Cyc is a big (bloated?) database that's also a fledgling AI -- the real question is, what cool things can we make it DO? Time will tell...
SINGLE is redundant on slashdot..
FRA: STFU GTFO
Does that mean it'll come in out of the rain? There could be good demand for this. A lot of people need a computer to tell them that water is wet and can be cold.
Google's 6 DVDs full of n-grams are much more interesting than that: they "processed 1,011,582,453,213 words of running text and are publishing the counts for all 1,146,580,664 five-word sequences that appear at least 40 times. There are 13,653,070 unique words, after discarding words that appear less than 200 times."
r -n-gram-are-belong-to-you.html
r eleases-massive-amounts-of-user-search-data/
http://googleresearch.blogspot.com/2006/08/all-ou
AOL has released interesting data as well...
http://www.techcrunch.com/2006/08/06/aol-proudly-
It seems to me that users are increasingly dissastisfied with the robotically maintained search indexes of Google, Yahoo! and the like. The internet has reached the point of critical mass where distributed indexing has the potential to rival the robots in volume--and it's clear that human intelligence will always trounce robots in filtering for relevance and quality. The niche that PeekYou.com tries to fill (and of course there are others) is the problem of searching for human beings on the internet. Google doesn't know that the Bob Jones you are looking for isn't the same as Bob Jones in Wichita, or Bob Jones in Juneau--and it won't separate them in search results. And that's just the tip of the iceberg. The other day I was trying to find my great uncle's blog. Turns out there's a senator with his name--Google sure didn't care.
To make a long story short, yeah, this is the beginning of a new era in the internet. And I'm looking forward to it.
Find your friends!
of a metaphorical "million monkeys."
that is, if there was a "rim shot" mod :)
A goal is a dream with a deadline
They could probably increase the database of connected items by extracting links from Wikipedia as well as various online dictionaries.
But isn't the power of something like cyc the fact that the connections have attributes, not just the fact that they are connected? A wikipedia article might have a link to something related, but unless you start employing nlp techniques to examine the text around the link, you wouldn't have any context and therefore wouldn't really provide much value above the wikipedia article anyway.
me: "Computer, bring me some women!"
cyc: "Error, you don't have that kind of authority"
me: "Computer, don't you know who I am? I'm George Washington! I was born in 1852, I single-handedly won the Civil War at the age of 25, and - most importantly - I built you!"
cyc: *checks wikipedia - verifies facts and runs image analysis on George Washington photo* "Hmmm, yes General Washington Sir, I'm sorry for doubting you. I will bring you women at once."
A computer once beat me at chess, but it was no match for me at kick boxing.
Having done a great deal of data processing, I have watched these projects off and on with minor amusement. The reason why is that, in my humble opinion, it will never work. That is not to say that it can't, just that these projects just love to forget Gödel's Theorem, which states, roughly: any sufficiently complex system will have things that are obviously true or false, but are not provable within the system.
Put another way, any complex set of rules will inherently be unable to stay consistent because eventually the syntax complexity become able to state, "The following sentence is false. The previous sentence is true." This occurs regularly in data processing when a given field's syntax (datum value) bridges or is not defined by your context (schema).
The real crutch is that syntax is inductive, where we try to fit each word into a category; however, our context (use of language) is deductive, we all learn it through experience with a physical world. I have seen this problem over and over as people constantly modify the schema to overcome syntactic limitation. While Cyc is designed to be constantly expanded with new rules, they are still syntactical statements.
By Gödel's Theorem, syntactic systems are doomed to fail. Instead, Cyc should be allowed to learn through observation and deduce its own understanding of the world so that it is not bound by any particular syntax. While this could work, it fails the ultimate intent. We want a computer that can both learn and yet not be wrong.
The problem is you can't have that. You can either be syntactically correct, but simplify the model until it works (Physics). Or, you can allow deductions and have to work in the realm of probability (Humans).
Although, I would gladly accept a computer that erred like a human and yet didn't bitch about how it was someone else's fault.
Bel, the mostly sane.. "Of course I can't see anything! I'm standing on the shoulders of idiots." -- Me
If you could build a Cyc-like database simply by feeding it a large amount of more-or-less unstructured text, then the Cyc project wouldn't have been necessary in the first place.
So are these the fledgling footsteps of an emerging AI, or just the babbling beginnings of a bloated database?
I'll take door number two, Monty.
That's not to say it's not cool, or that the data won't be useful in this form, but Open/Cyc is no more intelligent than the dusty reference tomes on my shelf.
Cheers.
I played for quite a while getting to know OpenCyc a couple of years ago. The documentation was poor, the software consfusingly buggy at times, the Java interface was just awful. Hey, I've got a life to lead, things to get on with, but I battled with the Java for a while before deciding it was a waste of time until the rest of OpenCyc was fixed. Ok, the weekend looms, the weather's getting worse, what the hell...
Straight from the Urban dictionary, read the definition of Rim Shot.
Cyc is only words and descriptors. If you attach them to 3d shapes and actions in the 3d world, the program can imagine what you're saying. It can even obey and do tasks if hooked up into a robotic body and scan the room. It requires the technology of being able to scan its environment then run something like the program they run to find text inside of images. Instead of finding text inside of images, its finding objects inside an environment. Pretty simple once you understand the basics, but it will take a lot of work. A longer descriptor of this can be found at: AI page Cyc isn't a waste, but you need to do something harder to make it into AI, you need to attach 3d objects to every noun, and apply 3d actions to every verb, etc. I'd say that'd be on the realm of next to impossible, so yeah what they've done really doesn't advance AI at all.
God spoke to me.
Very interesting! I'm curious when Google will start using this to sort their results.P rocedure d f n tscheid-e.pdf
InfoCodex already does all this today with the help of a linguistical database and synonym and/or similarity search across 5 languages (German, French, Italian, English and Spanish). With InfoCodex you can search for a block of text in one language and it will find you all the similar documents in the other languages as well. All of this is done without one single minute of training - because of the linguistical database (Ontology) that contains 2.9 Mio words and terms (i.e. "European Court of Justice" or "The President of the United States" are terms and reconized as such).
See the following links:
http://www.ywesee.com/pmwiki.php/Ywesee/InfoCodex
http://www.ywesee.com/uploads/Ywesee/archimag-e.p
http://www.ywesee.com/uploads/Ywesee/Evaluationse
http://www.ywesee.com/uploads/Main/USP_e.pdf
After some 20 years of work and five years behind schedule
Oh I thought they were talking about Duke Nukem forever for a moment there..... (Sure hope it runs on my brand spankin' new Amiga......)
Cyc has an ontology of general conceptual terms, and represents the precise logical way in which
those concepts interrelate. In other words, it emulates an aspect of the pure rational part of
human reasoning about the world.
But it's known that humans are not dispassionate rational agents. And indeed that there probably
is no such thing as a dispassionate rational agent. Commander Data and Spock are very ill-conceived
ideas of robot-like reasoners. Passion (emotion, affect) is the prioritizer of reasoning that allows
it to respond effectively (sometimes in real time) to the relevant aspects
of situations. Without the guidance of emotion, no common-sense reasoning engine would be powerful
enough, no matter how parallel it was, to process all of the ramifications of situations and
come up with relevant and useful and communicable and actionable conclusions.
So how do we give CYC passion? Or at least a simulation of it?
Well the key would seem to lie in measuring the level of human concern with each concept, and with
each type of situational relationship between pairs (and n-tuples) of concepts.
How could we do that? How about doing a latent semantic analysis from google search results. Something
similar to Google Trends, but which measures specifically the correlation strengths of pairs of
concepts (in human discourse, which Google indexes). The relative number of occurrences (and co-occurrences)
of concept terms in the web corpus should provide a concept weighting and a concept-relationship weighting.
If we then map that weighting on top of the CYC semantic network, we should have a nicely "concern"-weighted
common-sense knowledge base, which should be similar in some sense to a human's memory that supports
human-like comprehension of situations.
Combining a derivative of google search results with CYC is my suggestion for beginning to make an AI that can talk to
us in our terms, and understand our global stream of drivel.
I wish I had time to work on this.
Where are we going and why are we in a handbasket?
I see this technology being used to make a better computer experience. An OS and apps that can explain how to use them (as well as when things go wrong hardware or software), adjusting to the user. A nice plus if the above "learning" can be combined with OpenCyc.*
*Keeping in mind that "learning" doesn't have to be an obvious exercise.
"So are these the fledgling footsteps of an emerging AI, or just the babbling beginnings of a bloated database?"
Yes.
Comparing it to Windows will be a moot point, since El Dorado is going to have a 40% larger code base than XP.
"OpenCyc is the open source version of the Cyc technology, the world's largest and most complete general knowledge base and commonsense reasoning engine."
Is one to assume that the way to common sense logic in a machine is via linguistic/symbolic knowledge representation? How can this handwritten knowledge base be used to build a robot with the common sense required to carry a cup of coffee without spilling the coffee? And why is it that my pet dog has plenty of common sense even though it has very limited linguistic skills? I think it's about time that the GOFAI/symbol processing crowd realize that intelligence and common sense are founded exclusivley on the temporal/causal relationships between sensed events. It's time that they stop wasting everybody's time with their obsolete and bankrupt ideas of the last century. The AI world has moved on to better and greener pastures. Sorry.
So where's the source? All I could find when I looked a month ago is a binary blob with some api wrappers.
Michael
I've been working on a system to update and query the Cyc database using plain natural language descriptions and queries. There wasn't much interest from the Cyc community back then, so I began focusing on Semantic Web databases. I wonder if there's anyone working on exposing Cyc knowledge as RDF triples.
Don't be alarmed, Arthur Dent. Be very, very frightened.
Human thought is a rather complex thing, that don't always appear to follow logical patterns or rules. Or not the simple "if I want X, I must do Y" clear-cut rules that nerds everywhere expect. Human thought is a complex attempt at balancing the priority of not only "I want X", but also stuff like "but it would be socially bad to be seen doing Y", and "I could do Y1 instead, but that's way more effort than I can be arsed to do today", and "it would be nice to have time left to do Z too today, or the missus will blow a gasket", and quite often "actually I don't really want X, I want Z, but it would be uncool to admit that." It's not just following rules and logic, it's trying to fit it all in a complex scheme of priorities, social rituals, and whatnot, and most often boiling down to finding the least crappy compromise in that space.
In other words, whenever you find yourself thinking, "meh, people/men/women/engineers/PHBs/whatever are so stupid/illogical/whatever. If they want X, they should just do Y", chances are it's not them who are illogical. It's you who don't understand their personal version of that maze of priorities and rituals. Or what is the real Z they're after, when they say they want X.
Most of those things aren't even at a conscious level. Even if you poll people along the lines of "if you wanted X, would you do Y?", you'll get an answer that's most often useless. For starters it will be heavily skewed towards what they'd like to think of themselves, not what they'd actually do. Second, without providing a _lot_ of context, it will bypass most of those priorities and rituals that might override that in practice.
What's the point of this whole rant? That the first AIs trained by humans will inherently be a dud.
If you make an AI that functions by precise, inflexible rules, congratulations, you've just programmed OCPD. Literally.
Add a lack of perceptions of human reactions, feelings, body language, etc, and you've given it Autism too. Again, pretty literally.
I.e., I'd expect the first few AIs, or even generations of AIs to be... well, don't think the lovable R2D2 or the essentially human C3-PO, but an electronic equivalent of the most obnoxious socially-dysfunctional kind of geek.
If you want that as an overlord... I don't know, I hope I'm not around at least.
A polar bear is a cartesian bear after a coordinate transform.
...Firefox is returning a timeout error page. Oh dear, I hope they get their 'try online' server fixed faster than it took to get the app itself out...
Those using pirated Tinysoft signatures(TM) are a real threat to society and should all be thrown in jail.
According to this FAQ entry, it's not fully open-source...
Time to get a new project/product manager me thinks!
Believe me, if I started murdering people, there would be none of you left.
"So are these the fledgling footsteps of an emerging AI, or just the babbling beginnings of a bloated database?"
How about putting that question to Opencyc?
Done with slashdot, done with nerds, getting a life.
"Cyc isn't a waste, but you need to do something harder to make it into AI, you need to attach 3d objects to every noun, and apply 3d actions to every verb, etc. I'd say that'd be on the realm of next to impossible, so yeah what they've done really doesn't advance AI at all."
Like a game?
'How to set up a web server to withstand the /. effect' is not a matter of human consensus yet.
Use your Blackberry Pearl as a Bluetooth Modem in OS X
welcome our new Cyberdyne Systems AI overl...awe dammit!
We are starting to mine Wikipedia at the Cyc Foundation (cycfoundation.org, sorry not much of a website yet), which is an independent non-profit org that's working closely with Cycorp. We're managing the growth of the public knowledge base. Linking Wikipedia article titles to Cyc concepts is one of the first things we're doing. That will grow the set of concepts, and it will also create a way to browse and search Wikipedia conceptually, such as letting you look for a list of all articles about parks west of the Rockies that contain bears.
We're also working on creating Semantic Web compatible URIs for the all of the Cyc terms.
Anyone who wants to join the Cyc Foundation can contact me: johndcyc at cycfoundation.org.
Check the schedule of Skypecasts at Skype.org. We can add you to the chat, but you probably won't be allowed to talk UNLESS you have a USB microphone or headset.
You can also listen in on our Skypecast tonight. It's every Thursday at 9:30pm EST, 8:30 CST.
I think one place where Cyc or similar types of knowledge engines could really shine is in business. A business model is vastly simpler then the model of reality that people carry around in their heads; and one benefit that Cyc has is that it understands *everything* -- it is integrated by default.
So once it gets basic understanding of accounting, inventory, retailing, management, logistics, etc., you could easily build a natural language interface to it: "Three boxes arrived today from supplier X and we paid $90 for them". If there is ambiguity in the sentence, Cyc would ask natural language clarifying questions: "Was each box a line item on the invoice, or were there many line items?"
I think this would be much improved over the current data-interfaces we have today, which are basically graphical recapitulations of paper-based forms in the format of "field: [value]".
Another problem with modern apps is that they all contain their own internal, add-hoc ontologies. These ontologies are hard-coded, and usually aren't designed to intergrate with ontologies in apps from different domains -- e.g. logistics and accounting (unless they are from the same vendor). Cyc has a standardized, presumably well-thought-out and near comprehensive ontology. It can also grow its ontologies based on user input. So you have this automatic integration feature that's sorely lacking in the end-user computer world.
Computers are useless. They can only give you answers.
-- Pablo Picasso
See Matt Mahoney's description of Marcus Hutter's proof that compression is equivalent to general intelligence.
Seastead this.
"...it could *be* corrected slowly over time."
Sorry, couldn't stop myself.
Pi Ran Out
i.e. it isn't meant to be part of the A.I. system itself. Rather it's meant as a reference or teaching system for any AI systems which are developed.
Deleted
I agree with everything you said, and we at the Cyc Foundation are working to fix the accessibility problem.
The Cyc Foundation is a new independent non-profit org. I worked at Cycorp for 7 years before forming the Foundation with a co-founder that has a totally outside perspective. We're very optimistic about the progress being made. We've got about 2 dozen people helping so far, and that's before we've made anything available (such as the Web game we're working on) that will allow for much broader involvement.
Listen in on our Skypecast tonight (every Thursday night) at 9:30pm EST. Look for it on the list of scheduled Skypecasts at skype.org. You can participate if you have a USB microphone or headset.
I think google has surpassed OpenCyc by orders of magnitude in knowledge. You can do a lot of correlation searches etc. This can be used for language translation, as a dictionary (As seen when you misspell your google search term), and general knowledge.
All that is missing is a good frontend, to translate your questions into a few million searches.
"Fix it"
May I point out that the naming of this technology might be somewhat misread. Namely in polish "OpenCyc" would mean "OpenTit"... Well how does that sound to you? :D
We plan to exploit the N-grams in our knowledge collection work at the Cyc Foundation.
:-), you can join our Skypecast tonight.
If you hadn't seen me mention it already
i always wanted to have a knowledge base and commonsense reasoning...
I remember cyc from an old (early 90's) PBS doumentary series about computers called The Machine that Changed the World. IIRC, cyc isn't just a database of facts, it's also an engine for making inferences based on those facts. The researcher on the show said that every morning they would come in and read the list on new inferences cyc had generated overnight and fix the incorrect ones and then start inputting new information. One amusing example they gave was that since most of the individuals they had told cyc about were historical figures, it inferred that most people were famous.
I don't live in Urbia, you insensitive clod!
"So are these the fledgling footsteps of an emerging AI, or just the babbling beginnings of a bloated database?"
Eliza responds: (http://www-ai.ijs.si/cgi-bin/eliza/eliza_script)
"Would you like it if they were not these the fledgling footsteps of an emerging ai or just the babbling beginnings of a bloated database?"
Now, if we could only get these two wacky kids together...
"A microprocessor... is a terrible thing to waste." --
GeneralEmergency
The reason humans are able to use the "facts" we have accumulated over the years for problem solving (intelligence), is because the facts are intertwined with our experiences and our mental model of our world. This mental model is absolutely critical to be able to extrapolate information from any given "fact."
For example, when someone says "it's raining" and you are about to take a walk, your brain is able to conclude you will get wet due to the underlying understanding of the physical environment and the ability to project/simulate a future scenario where your body is not standing under the cover of a roof.
IMHO, text based databases which attempt to solve this problem without supplying a system trained in all of our human experiences and interactions with the physical world, will fall far short of our desires for AI. By having rules and relationships between the facts it would appear that such a system is in place, but in reality it is an attempt to enumerate the possibilities we encounter in the real world instead of supplying an underlying model that can extrapolate those relationships from a lower level "base" understanding of the physical environment.
> "OpenCyc is the open source version of the Cyc technology, the world's largest and most complete
> general knowledge base and commonsense reasoning engine." The Cyc ontology "contains hundreds of
> thousands of terms, along with millions of assertions relating the terms to each other, forming
> an upper ontology whose domain is all of human consensus reality."
Brought to you by his holiness Maharishi Mahesh Yogi for total knowledge and higher levels of vedic consciousness.
http://mou.org/mou/overview/02.html
ugh, I need an Aspirin.
Meanwhile google happily eats whatever crap its spiders manage to find and thru some hacking and dark magic algorithms is still able to give not so meaningless answers to not to much badly worded queries.
That's a key point explaining why OpenCyc came too late. Wordnet, Thoughtreasure, Cyc et alii all share a set of common drawbacks. Their input data need to be specially formated. That's why all those overly ambitious project have progress so slowly in the past years, and are still only limited to answers precise non-ambous simple question like "Is a cat a mamal ?".
This is linked to their fundamental design around a solid, non-flexible, pure logical architectures (reading their repective Wikipedia entries help understand how they work). In a way, the scientist behind those projects tryed to apply the same kind of language logic that is used in maths and programming languages to human language, and while this may be usefull for some academic purpose or very specific application were some reasonning may be useful (which has been used and applied well - I've seen it at least for WN and TT), they don't scale that well to REAL-WORD(tm) situations.
Their fundamental structure clashes with reality of human reasonning : WordNet is limited to single non-ambigous meaning for terms (no things like "nut" as in the seed, and "nut" as in the thing that can be screwed on a bolt). Other "stuctured" designs clash with real life's fuzzy nature with the other softwares.
Meanwhile search engines have grown in a completly different way. Initially they were designed only to scan pages content and then index their keywords for later queries. Only after that, slowly, one hack after another, they where tuned. In order to make results more revelant. In order to avoid link farms. Finding some complexe strategies in the ranking calculation to return more correct and more meaningful. To find results not with matching keyword, but with related keywords (Google's "Keyword is encountered only in page linking to thig target"). To cope easily with bad spelling (something that is very common in the real life. Something that is difficult to even detect for a common-sense engine. something that is very intuitive in search enginges, and that is even more optimisable given the statistics that such engine can do). And lot of other small ponctual improvement.
And slowly, by on one hand having a system that gets each day a little bit more optimised, and, on the other hand, an incredibly huge corpus to process that grows at a very fast rate, the search enginges, like google, become fantastic multipurpose information retrieving tools.
By now, you can type crap in google and still get something (as long it's not a "google-sepuku" like of crap, but more of "I'm very clumsy with my wording and my keyboard-skills"). You can have also other wonderful information, including stats on spelling errors or even statistic based translation (that are otherwise very difficult to get by classical mean), static about currently hot topic (which can be fed back to improve results for ambigous queries).
All this because search engines are built around a fuzzy logic : at the core is a braindead simple indexing rule, slightly modified by a bunch of hacks.
Such fuzzy logic approach "without really needing to teach the machine everything" has been recently successfully used on
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
it inferred that most people were famous
heh yeah it also asked the question if Abraham Lincoln was at the White House if his hand or foot was there with him. Because they started input data from encyclopedias and had not put in the data describing what a human was yet.
I also wanna say there was something about it posing a question about religion and langauges that was eventually used to write a master's thesis
Spook: Where is Bin Laden?
OpenCyc: Bush's Ranch in Texas.
AI by definition has the ability to learn for its self, what we have here is just a large database of human input, nothing that OpenCyc has found for its self.
portfolio
Who wouldn't love an AI monkey?
Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
The only thing that will save CYC is a computer that is intelligent enough to understand its importance.
Maybe I was inspired by that Hyperactive Bob/robotic fast food story from yesterday. Could Cyc be used to aid/automate education? Some of the most effective teaching techniques involve a guided exchange of questons between the student and teacher. Could Cyc be modified to ASK questions? Could Cyc be used to quantify what students are learning?
All this time and effort was spent to educate a computer, can we dump that knowledge back into young uneducated humans?
If Mr. Edison had thought smarter he wouldn't sweat as much. --Nikola Tesla
And what does this say about the architect and contributors to opencyc?
they ain't got no common sence!
Hmmm, some how that seems inherent in such an undertaking.
--
Automation applied to an inefficient operation will magnify the inefficiency.
And at the same time, the informality and errors in the data might introduce a human factor on the engine.
To find out when Godel's Theorem really applies, read "Godel's Theorem: An Incomplete Guide to Its Use and Abuse", by Torkel Franzen. Even Roger Penrose and Stephen Hawking have gotten it wrong, so don't feel ashamed.
I keep wondering why do we need to input this type of redudent information into a computer?
The ideal AI would be able to assert these conclusions for itself in much the same way as a child learns to associate "Mommy" with alive and lady and kind and fun... In other words couldn't this whole project be automated for the most part?
Again, inputing this type of data into a computer just seems so backwards to me.... 1960s pseudo SCI-FI..
The computer should be making these assumptions for itself, then asking or being able to question any ambiguities only after it has become confused.
Would you like it if they were not these the fledgling footsteps of an emerging ai or just the babbling beginnings of a bloated database?
If you're going to use that AI as a tool, yes, ok. But the post I was answering to was the usual "I, for one, welcome our overlords."
And trust me, you _don't_ want an overlord that's inhumanly logical about it. It's that kind of thing that led to such logical solutions as "let's extermine the population of Poland until 1970 to make room for German settlers." Or such logical solutions as communism. Sure, on paper it's perfectly sound and logical, if you assume that you can change humans overnight. Maybe sometimes being able to understand humans actually helps, eh?
That said, most of the stellar job performance that OCPD cases claim exists only in their own mind.
They tend to never get a job done because it's not yet perfect, for example. I have one two rooms from me at the office, who's taken three fucking years just to get a build script done because everything wasn't perfect enough for him. No exaggeration. Literally. Well, in parallel with building a convoluted unit testing environment, because the existing one didn't satisfy his purist view of the matter. (The old tests had some functional testing too. So his perfect version actually tests less, but is _pure_ unit testing, by his own definitions of it.) Of course, he's convinced that he's done a stellar, uncompromising job, but for everyone else he's just wasted some time and didn't even achieve more than what we already had.
Do I really want that even in a computer? Nope, not really. _The_ problem with most programs nowadays is just that: that they're OCPD nutcases. Workflows that were a lot more flexible (even if not as fast) with a pen and paper, get shoehorned into some lobotomized set of rules that allows no exceptions. The problem is that most often the rules aren't actually what the user wants to do: e.g., you end up unable to save a new client's data until you know their fax number, whereas with a paper form you'd fill in the data you have and leave the rest for later. Often it's more annoyance for the users and more work in workarounds, than doing it without a computer in the first place. (Of course, the equally OCPD-ridden creator will then bitch and moan about "idiot lusers" and how everyone should change to fit his perfect tool, instead of his tool changing to do what the user actually needs done.)
No real qualms with autism on its own, though. They tend to be very good with a computer, or any kind of abstract problem for that matter. (If sometimes difficult to deal with in a team.)
Combine it with OCPD, though, and... well, let's just say that they mix like Ammonium Nitrate and Fuel Oil. You get some of the most obnoxious personalities that way, and it's no fun for anyone involved, not even the geek. The poor bugger can't even tell that he's the one who offended the whole room, and proceeds to imagine that he's the victim of unwarranted cruelty.
A polar bear is a cartesian bear after a coordinate transform.
AI is bogus.
1 /qid=1155242163/ref=pd_bbs_1/103-4246079-1703018?i e=UTF8&s=books
See The Jargon File entry for micro-Lenat
http://catb.org/jargon/html/M/microLenat.html
For a more literary perspective on the attempt
to imbue machine intelligence with common sense,
see _Galatea_2.2_ by Richard Powers,
http://www.amazon.com/gp/product/0312423136/sr=1-
---
He's no fun; he fell right over.
Wait a minute. Didn't I say that on the other side of the record? I'd better check
" So are these the fledgling footsteps of an emerging AI?"
No. It's probably far, far more useful.
I am rather sad to see all the jokes-for-mods on this topic. This is such a fundamental project, with a critical head start (1985ish beginning). What Doug Lenat is trying to do is build the "ridiculously easy" base behind life. "If I put my fan on top of the counter, (unless it's out of balance and wiggles its way off) the fan will continue to stay there."
I understand the "6000 concepts" to be these "easy" ideas that we take for granted. Then anyone in the world can make "modules" for specific branches of knowledge. If there's an intelligent integration system, This could really grow within 10 more years.
"I want to read a fun Science Fiction story".
---> Do you like series? (Y)es / (N)o
"No, I hate Star Wars and Star Trek"
(Processing: User Emotional Matrix Mod Star Wars -3, Mod Star Trek -3)
---> Name an example of a Science Fiction story you found 'fun'.
"I liked Cordwainer Smith 'Game of Rat and Dragon' "
(Processing: Offer counterpoint potential example from same author)
---> Did you like 'Scanners Live in Vain' by the same author?
"No. Too confined, too creepy."
---> Recommendation from Same Author?
"Yes"
(Processing: Characterization +2, Location-Scope +4, Language.Grandeur +1)
---> Try 'The Burning of the Brain'
Except attempts to "stump the bot", linked modular expert systems will eventually prove extremely competent, and force us to decide what abilities lie outside the range of expert systems.
--TaoPhoenix
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
Both have a place. Neural nets, in all their variety, have a long, loooong way to go before acquiring enough resolution to have the ability gather the same level of understanding that Cyc has. Nets are usually highly specialized, where Cyc has tremendous breadth and depth.
Nets learn, but Cyc is taught. Do you push a newborn infant out into the world and expect it to acquire all it needs to know in order to be a successful organism? Of course not. And Cyc lacks the inverse, because it's just a predicate base.
Could be there's a way for them to play together.
O lord, bless this thy holy hand grenade, that with it thou mayest blow thine enemies to tiny bits, in thy mercy.
I certainly agree with you about the importance of assigning values, but emotions are only one way of doing that, and a fairly abstract way at that (they're a combination of many other values, weighted by the individual's personality).
Other value systems include "threat level" (very popular in the animal kingdom, and important for self-preservation for any entity) - objects like "dynamite" can be assigned a higher threat value, which will focus attention. "Relevant resources" are another; any objects that are considered useful for growth (this can include interaction with other entities). "Cost" is an obvious one, also "uniqueness/replacability". There are many others, some more relevant to humans (such as "aesthetics" and "humour").
An association database like Cyc can then make deductions from an initial set of values. For example, if it is told that "dangerous == high threat", and "explosions are dangerous", it then classes all explosion sources as threatening, and will not be so blasé about dynamite in the future.
Why would anyone engrave "Elbereth"?
I'm afraid that most people will ALWAYS be disappointed in AI for several reasons. As the pace of society increases, people in general seem to take less time to think before asking their questions and usually get bad answers because of it. If the question is specific enough then it's the slurring of the speech, or the use of jargon, or a colloquialism, or background noise that throws off the listener.
If humans are the "gold standard" for understanding another person then AI can't do any better. A computer could make things worse by having access to TOO MUCH information. It would need to know more of the situational context before it could answer a question because of all the possible duplicate meanings that only a massive database would offer.
I would be happy with AI that was geared towards specific areas like medicine or art. That would narrow the context greatly and avoid annoying the user by not having to ask a bunch of contextual questions first.
"Meaningless!, Meaningless!" says the Teacher. "Utterly meaningless!"
"You're everywhere. You're omnivorous."
- I've created the following constants for my cats, their sibling and parents:
- #$Comet-TheCat
- #$Rocket-TheCat
- #$Packet-TheCat
- #$Mama-TheCat
- #$GhostDad-TheCat
- I've asserted (#$isa [cat] #$Cat) about all of them.
- I've asserted (#$biologicalMother [cat] #$Mama-TheCat) about Comet, Rocket and Packet
- I've asserted (#$biologicalFather [cat] #$GhostDad-TheCat) about Comet, Rocket and Packet as well.
- I even created #$ConceptionOfKitties, asserted (#$isa #$ConceptionOfKitties #$BiologicalReproductionEvent), (#$parentActors #$ConceptionOfKitties #$Mama-TheCat) and (#$parentActors #$ConceptionOfKitties #$GhostDad-TheCat).
So why can't Cyc infer that (#$siblings #$Comet-TheCat #$Packet-TheCat)? Is it a limitation in the public subset of the ontology, or some more fundamental issue with my data?What do Douglas Adams fans say that the answer to life, the universe, and everything is?
Please, for the good of Humanity, vote Obama.
Me without mod points again... I giggled my ass off on this one...
-- daecabhir (this mind intentionally left blank)
Wordnet, Thoughtreasure, Cyc
Wow, I had no idea this was written so long ago.
To paraphrase another great poet:
Still boring after all these years.
Paul Simon
Taken as criticisms, the allusions to 'bloat' and 'database' are both significantly wide of the mark: if Cycorp has been guilty of anything, it's historically underestimating the size and technical complexity of the knowledge base indicated for the common sense reasoner the company aspires to build. OpenCyc is not a database except in the most attenuated sense: it encodes, not instance-level facts, but quantified and contextually parameterized rules for reasoning about the everyday world, and it is, if anything, far too small for this purpose. The number and complexity of the rules needed for this is fairly staggering and too-little-appreciated, even by many in the AI community though not Minsky and McCarthy, both of whom are on the record as having recognized Cyc as one of the very few efforts in the field that was on anything like the right track). It's also fair to say that efficient and suitably flexible inference over a knowledge base of this size and complexity, and automated induction of new reasoning rules on the basis of experience - both obvious prerequisites for what Cycorp has been trying to do - present significant and partly unsolved theoretical challenges. The company's surprising willingness to tackle such weighty and potentially intractable issues head-on is a thing greatly to be commended in the present season of intellectual and commercial timidity, and even though they may not have always been able to deliver on every promissory note, one can't help but admire their spirit. And the fact remains that OpenCyc is now being used by an enthusiastic community of unaffiliated developers who are busily laying the groundwork for a new suite of open source applications. Judgement should not be pronounced on the basis of their efforts before they have been given the chance to see what they can deliver.
I was one of those. Which resulted in a BBS message from a slightly annoyed user. He had this over-intelligent communication program which assumed the connection was lost when it encountered that line. So it hung up the modem.
I vote for #1 :)
A goal is a dream with a deadline
Cyc and hence the open version of it OpenCyc is a rule based AI system. Certain degree of AI is already available OCR, Speech recognition, even Google has some smarts in it. All these systems are mostly statistical. Many have intelligence built into the model design, but the actual numbers that make up the model have very little meaning. Bayesian networks to my taste capture the most info but they are still behind. But the jury is still out, what aproach will take the first mayor step towards a reasoning system. As for the statistical AI. I do buy the idea that ultimately the information that is being processed is not neccesarly as relevant as the final result. So eventualy a reasoning system can function on a purely statistical basis. And there is a chance that our brain is purely statistical. But it's development with human elements as a statistical machine it's to me unlikely. Now back to Cyc. If you download it, it will be difficult to use, you will find bugs inside it. Yes, juck they do exist. But it's a start. The biggest problem to fleshing this thing out is getting users to use it. Cycorp is no microsoft with hoardes of PR and Marketing people. And they have not documented every feature. But I think it's the most complete AI system out there. So is Cyc "the fledgling footsteps of an emerging AI?" or "the babbling beginnings of a bloated database?" I'll add my own. "a huge effort to acomplish something very difficult" "not terribly well documented" It's probably a little of each one. I personally think that there is a huge opportunity here. For people that are willing to work with Cycorp lack of support experience. Cycorp needs user input, and if/when they get it you will be surpriised of what is in the box. Everybody talks about the Killer app for AI, there is a good chance that app will have to do with Cyc. There are several people from Cycorp watching this thread. I hope I got some people interested. I would suggest you post questions if you have any.
The first problem is what we're expecting from "AI". Don't expect Cyc to do everything and be the one true answer. Also, don't expect that the people who made it think it's for that either. It's a building block, a tool, and if we're ever to make computers behave more intelligently we'll need more than one of these tools.
Cyc's value is proportional to the amount of uses it has, and the effectiveness of those uses. Now we have a line in the sand - we have a database of painstakingly constructed logical inferences - and opening up a subset is a great way to enable uses to emerge. Applications the original designers couldn't have thought of, can now be created.
When I first heard about Cyc, about 10ish years ago, I thought it was destined to be limited. But so will all individual techniques or tools in computer reasoning, and that does not mean that each has no value.
http://www.tudumo.com - todo list with tags