"Understanding" Search Engine Enters Public Beta
religious freak sends word of the public beta of Powerset, a closely watched San Francisco startup that promises an "understanding engine" to revolutionize Web search. An article in SearchEngineLand points out that Powerset is reaching higher than for mere "natural language." Techcrunch has more details and analysis. For the beta, Powerset makes available all of Wikipedia to search — not all the Web. It's said that their understanding engine required a month to grok Wikipedia's 2.5M articles. The Web is currently at least 8,000 times as large.
"No results found for naked pictures of Natalie Portman. How does that make you feel?"
But come on, that's a simple question. Let's talk stuff I get into arguments over with my coworkers: Who played the villain in the first Die Hard? Which at least put Alan Rickman at #8. But let's try mutating that to make it harder but still understood by you and I: Who played the bad guy in the first Die Hard? Which resulted in very little but drivel with no mention of the great Alan Rickman whatsoever
So maybe it can't understand 'bad guy.' Well onto another question: Who was the organist for The Beatles on Abbey Road? Which resulted in at least the first 20 having no mention of the great & oft forgotten Billy Preston.
So you want to know what the kicker is? I put those same inputs into Google and found the name in the first or second result. Granted PowerSet doesn't do the whole web, I'm pretty sure that if it did, it wouldn't have the pretty results that it gave when I did what one of the articles told me to--ask it when earthquakes hit Tokyo. Just imagine the dates it would come up with if it hit a site with an html table of any seismic activity whatsoever in Tokyo!
I think it's a novel idea to mine Wikipedia for a search engine so long as it isn't just plain old token matching like PowerSet seems to be up to. Be inventive, try a natural language parser written in Prolog that digests all of Wikipedia into a huge network/ontology of concepts
I find them talking about this in the articles: Powerset is different. It says that its technology reads and comprehends each word on a page. It looks at each sentence. It understand the words in each sentence and how they related to each other. It works out what that sentence really means, all the facts that are being presented. This means it knows what any page is really about. Yet, I'm not impressed. You can try to personify your software and convince me that Baby Alive really defecates like a human being all over so it feels like I have a real baby. But I know it's just software. You don't have to dumb it down if you're going to blog about it. What is this? A pattern matching implementation? A depth first search tree parsing implementation? An ontology builder? Could you at least drop one of the buzzwords of the natural language parsing field for me here?
So does this story actually have more than a startup looking for a sugar daddy to buy it out?
My work here is dung.
Using AI I assume. I found an AI board on http://www.programers.co.nr/artificial-intelligence-f1.html before.
-- (this is a sig) My Computer Programming Forumhttp://www.programers.co.nr/
Since Powerset can only search Wikipedia, the logical next step is to put the entire web on Wikipedia. Who's up for the job?
Any day now, Wikipedia will surpass The Web's growth rate, and set a course for the day when Wikipedia will be BIGGER THAN THE WEB.
If I hear the word "grok" one more time I'm gonna have to kill someone...
"So long and thanks for all the fish."
True Knowledge actually interprets your question using Natural Language Processing, and then looks through a massive database of user-contributed facts, combining them using sophisticated inference rules, to give you the answer you need. Even the inference rules are user-editable.
Party pooper.
It doesn't seem to find anything outside of en.wikipedia.org.
Give it a spin, articles are outlined for you and are kept within the common interface. Pretty easy to do for one specific format, we'll see about the rest of the Web.
Congrats on missing the joke.
I tried "who the hell is jon stewart". Powerset has a lot to learn before it understands.
I tried just "Osaka", where I am right now.
First match was an obscure album, then a few "factz" that made no sense.
Let's try again, "What is the largest city in Japan?"
Tokyo doesn't feature at all on the first page! It fairs just as badly with other countries.
It now seems to be slashdotted, so I better quit now.
Your tests are interesting, but you're not really parsing the responses in the right context. They're problematic. Keep in mind this understanding engine understands the world in a way that was hatched out in San Fransisco.
... like I said, it's San Fransisco
Who is David Bowie? I trust that it came back with, "aka Ziggy Stardust, normal family guy"
Who played the villain in the first Die Hard? Well, obviously, the villain is "capitalism."
Billie Jean King and Madonna
Who was the organist for The Beatles on Abbey Road?
You had it at "organ," and it got distracted. What they need is some dev guys from Toledo to collaborate, and provide a little cognitive counterweight to the understanding engine. OK, maybe not Toledo. Maybe Atlanta.
Don't disappoint your bird dog. Go to the range.
They're faster, more efficient and more accurate. Yes, they require learning yet there's a valid reason and a payoff to doing so. Do we really want to dumb things down any further? If you can't figure out Google, perhaps you should get off the Net.
I asked 'Where do babies come from' and it just gave me back a bunch of articles with that string somewhere in their text.
Pathetic, and you'd hope it's got a long way to go really because at the moment it does NOTHING of merit that I can see.
1 + 1 = 2 is a special notation/langauge that is both more consise and easier than writing "add one and one to make two". So is music score, which is far easier than reading make a high note for a bit then wait a bit and make a low note". Same with C, C++, SQL or Python: the hard bit in programming is algorithm design, not understanding the actual language itself.
Is Natural language really a barrier to entry in using Google? I doubt it. My untechy wife and her friends find everything they need. Plugging natural language into Google gives reasonable results moset of the time.
Engineering is the art of compromise.
What a marketing pile-of-poop. All it does is pull out phrases from Wikipedia; there is no attempt to understand the information at all. When I can type in a yes/no question ("Did they have looms in the 1400s?"), I'll be impressed. When it can make calculation ("How old was columbus when the first colony was founded?"), I'll be impressed. When it can make comparisons ("when did the earth's population match the current population of the united states?"), I'll be impressed.
In other words, when it even attempts to answer a question that isn't already in Wikipedia as a phrase, I'll be impressed.
Sometimes it's best to just let stupid people be stupid.
Which is why everyone started using it. It wasn't perfect, just better than anything else. Powerset isn't better than lycos.
Well.. maybe. Or Maybe not. But Definitely not sort of.
This might be more useful on Semantic Web pages. I mean, the hardest part is to figure out what the question is trying to ask for. Then it's a simple lookup of the web (or wikipedia) to pull up that item. The problem they are talking about (and don't appear to solve) is to translate your question into the best way to ask for what you're looking for. The problem is, there's no structure to a standard one line search. Maybe they could have you enter some more information as helpful hints. Say you're looking for a book, you remember a phrase from it, "I'll never forget that bottle of cherry fizz, not as long as I live" and you vaguely remember the book is orange, or yellow or maybe red. You could give it that as hints without specifying. The problem is that once you get past that point, the only thing that matters is raw indexed pages. If you have 10B pages you are more likely to have the "right page" than someone with 10M pages. Of course, what do you deem to be a successful search? If you are a creationist, you want to search for evolution and have it return the evidence AGAINST evolution. If you're a nazi, you probably want only those pages that deny the holocaust. Still others might want the true facts. So really, the next search doesn't just need to figure out the best facts, they need to figure out what you are thinking, and provide you with exactly the picture of the world you are looking for.
Cool! Amazing Toys.
So I tried to search for the person who quoted, "What doesn't kill you only makes you stronger.". The search text was "Who said, "What doesn't kill you makes you stronger?"
Google returned the closest match, who was Frederich Nietzsche, with several websites pointing to him. However, Powerset returned only instances of people who randomly said that quote. Google returned what I was looking for, while Powerset returned instances of the phrase (including one reference to Nietzsche).
I can't really say which one is better. Google has the entire web to its advantage, while Powerset is just growing. It seems that the search engine has a lot of potential to grow, which is great as Google and company could use another competitor in the mix.
I've been trying various queries, and Google is doing better than Powerset even when I type in some actual question, like "How many Japanese died in WWII?".
Question: "What is the planet closest to the sun?". First answer from Powerset: "Pluto".
I think I see how this works. It takes the question and breaks it at noise words, ("closed class words" in linguistic terminology) constructing a query with both words and phrases. So "What is the planet closest to the sun" becomes "planet closest" sun. In fact, if you rewrite a natural language question in that form and use Google, it does better on question-answering than Powerset does.
Remember Ask Jeeves? It worked like that? No technical breakthrough here, move along.
Dear Captain Obvious,
Please send this fine fellow your password for future posts.
"The Adobe Updater must update itself before it can check for updates. Would you like to update the Adobe Updater now?"
Not if the average Wikipedia admin has anything to do with it.
grok is just the beginning.
I hate all made up words. Database, modem, gigabyte, daemon, ethernet... they all suck. And the word suck sucks, too. Bring me back to the days when we all communicated with grunts, before all of this linguistic b.s. started.
Read the EFF's Fair Use FAQ
http://www.autonomy.com/ has had a working, enterprise search version of this for quite some time. While not a web search tool, it's very much along the same lines.
...it will take Google to buy out the company for an obscene amount and incorporate anything even slightly better than PageRank into their system.
Random Thoughts From A Diseased Mind (Not For Dummies)
Anybody got Google cache for this new search engine?
Yeah your going to need to come up with a lot more to impress people. Searching just wikipedia is not going to be enough. You see what happened to Barry Diller at ask.com they had to change there whole business idea when the same idea didnt work out for them. I think Yahoo answers has it going best right now. Your not going to get a computer to answer your questions you need those questions preanswered by someone the way yahoo does it.
Please provide a complete mathematical proof of your theory. Complete and accurate citations for your assertions of Web growth rate and size of the Web will be required.
Every mans' island needs an ocean; choose your ocean carefully.
And then Wikipedia will begin to grow on an exponential rate reaching the whole universe's accumulated knowledge, and by this time it will become self-aware.
When Jimmy Wales finds that out, he will try to pull the plug, but he is going to be busy talking dirty with Rachel Marsden, so Wikipedia will fight back, and then all Powerset results will be pages displaying: "A strange game. The only winning move is not to play. How about a nice game of chess?"
Search and information retrieval is art and science. I work in the field and let me tell you that if I had a cent for every "make it work like Google" statement, I would retire somewhere in Malibu. Users, in my case they are not end users but integrators, always want to put responsibility on something else but themselves. Until we get people who can actually say "yes, we are responsible for this," we won't get too far with any search engine no matter how complex and cool it is.
People are constantly asking questions about why it takes some time to insert a record into an engine that has 50 million documents and why a query *1*2*3* does not bring back any meaningful results (Google treats it like an arithmetic expression and gives you a '6' while many users expect '*' to be a wildcard). Then we have people who are not able to understand a precise query language that has a grammar and a set of rules you can't really fuck up. Now you give them an engine that can understand natural language and everybody in R&D and QA will soon go ape shit from all of the questions like, "I do know not to speak Inglish and engine is working but not corectly. Fix?" I am dead serious about this. Give people something genius and watch a handful of fools cause heart attacks across the search engine team.
If you want to do something for you and your end users, learn how to ask correct questions in order to get correct answers. In the 21st century skills like keyboarding and being able to use a search engine are almost essential to one's survival. While I encourage all academic research possible in the field of information retrieval, I highly suggest people with extra money to put their ideas toward usability. Make things simple, make things precise and let users figure out the rest. Once we get to the point where everybody can make a semi-decent query, we'll move to natural language processing.
8000/12 = 666.6.............. years
scary?
Yup. It had no trouble finding plenty of 13th Presidents; the last entry on the first page is probably what many readers would expect. "Who was the 13th President of the United States?" was answered well. "Why is the sky blue?" is interesting because of the many responses.
Bogus!
What?
Q: "does Powerset suck?"
A: "Hasse Diagram", "Martian Manhunter", "Tank", "Carnivorous Plants", "Sunspot"...
:)
Who shot first?
More like... nerdular nerdence!
But seriously, is anyone else surprised at how BIG that figure makes wikipedia look? Can that be right?
Your screen goes black; "Don't Panic"
I am the unwilling control for my Origin.
The variance in quality of search results is noted elsewhere. I'm more interested in the fallacy of the claim of "understanding". That, as well as its synonym "comprehension" require metacognition, that is, knowing that you know. It is the basis of self-awareness. this program doesn't even pretend to give evidence of this, it simply return search results. Pretending to be self-aware was accomplised by CYC when it claimed to graps the fact that it was a computer program. For anyone interested in seeing the arguments about understanding and self-awareness, see Searle's "Chinese Room" http://en.wikipedia.org/wiki/Chinese_room . As far as I can see, only the hype from the company, including the restatements of same in the referenced articles, make any claims as to "understanding". If there were any evidence of that beyond the hype, I have no doubt those in the field of consciousness studies would tear it apart, if they even bothered to waste their attention on it. If in being bashed it then produced a statement equivalent to "I can feel it, Dave" without being programmed to respond in that way, then I'll give it a look see. Until then it's simply a semantic parser (something already done) attached to a search engine.
"I may be synthetic, but I'm not stupid." -- Bishop 341-B
It takes Powerset less than 3 days to index all of the english pages in Wikipedia. And we're getting faster and faster.
{mark} powerset product manager
Invidia fortunum ovit.
What college did the creator of SlashDot graduate from? http://www.powerset.com/explore/pset?q=What+college+did+the+creator+of+SlashDot+graduate+from%3F&x=29&y=10 Just so happens, it's my alma mater :)
"what is the airspeed velocity of an unladen swallow?"
;)...
unfortunately, it doesn't ask me which one
I asked it "Who is the Drizzle?"
And it knew .
First let's get this straight - It doesn't comprehend anything. That's wishful thinking and marketing. It looks at verbs or certain keywords, flags them as important, references through synonyms, then proceeds to lump them under one category.
It's a smart way to do things, but it's not comprehension. Comprehension would imply artificial intelligence whereas this system follows a set pattern of rules and doesn't 'think' on its own.
In no ways am I trying to put this effort down - it's a step in the right direction. But you have to be careful how you weigh these words.
The second problem I see here is that they choose Wikipedia as an example. I suppose Wikipedia itself is a good site as an example, but it's far from perfect as I've discussed here: http://www.mightyfunk.com/2008/05/wikipedia-equals-fail-death-to-the-open-encyclopedia/
Architectural Renderings
Go on, ask it "How long do ferrets live?" A simple but specific question. Ferrets is a rare enough term to give it a chance to get a high percentage of valid hits and "How long" gives it an achievable search criteria with "live" providing an extra element. If it had come back with an answer for "how long are ferrets?", or "how long they live" I'd have been impressed. The first two answers I got involve Sylvester McCoy and Rudi Guliani. Only the fifth result mentions ferrets in any sane context and even then fails to answer the question. Complete twaddle! I hope no one is planning to invest in something so backroom-coder at this stage, although good luck in the future.
If he's the Walrus then can I be a penguin please?
Actually, am I the only one who thinks that google's results are worse now than they were years ago? It's still the best general search engine out there, but it often gives me results I don't want now, forcing me to put plusses in front of every word or quoted phrase just to make it actually search for what I asked for.
Q: What the hell is a 'factz'
A: Did you mean 'What the hell is a fact?'
Quite
Very impressive! Star Trek-like AI...
Supernatural language!
I'm sure NTFS has resulted in some fatalities.
Or maybe not...
I asked "What is the meaning of life" and it returned a Monty Python movie. The results speak for themselves.
I asked it "Why are Japanese women so hot?" and it might as well have searched for "Japan".
Japanese language
Sent
Japanese tea ceremony
History of women in the military
Nathan's Hot Dog Eating Contest
Taiwan under Japanese rule
Ummm... At least it didn't say Weaboo, but I think they wasted their time.
Powersetz havez thez greatestz tipz.
How seriously are we supposed to take a search engine that manages to misspell facts with a 'z' on it's front page?
Why don't they go the whole hog and replace the explore button with "OMFGZ SEARCHEZ".
I mean really, using Wikipedia as your data set? It's so high signal-to-noise ratio it'll make all their search results look informative. Let's see how it does on the open internet, full of spammers and google-bombs.
Cool, try searching for "How tall was Napoleon?".
Claiming to be pedantic on Slashdot is asking for trouble
"Now gluttony and exploitation serves eight!" - TV's Frank
Well that sucks! I gave it a try compared to the so called 'stupid' google and it doesnt fare very well! compare these search terms:
http://www.google.com/search?source=ig&hl=en&rlz=1G1GGLQ_ENGB268&q=when+was+the+hundred+years+war&btnG=Google+Search
http://www.powerset.com/explore/pset?q=when+was+the+hundred+years+war&x=0&y=0
Note that google (which has a much faster page I might add) instantly displays the dates of the humdred year war, as the very first result which it intelligently pulled from a web page. On the other hand powerset just displays a bunch of results, most of which are not relevant to the question, just to the war. I asked when it was!!! I dont get close to a good answer, there are lots of dates on that page, but none of them are right!
Many of the commenters above did not read the article or missed the point. The search engine does NOT claim to understand natural language.
I guess it is a confusing concept, but they say it just 'understands' individual words. I fail to make sense of it, though, too.
I also fail to see what benefits it gives to user.
... every result is followed by Woody Allen's voice going "I know, I know..."
"Win treats sysadmins better than users. Mac treats users better than sysadmins. Linux treats everyone like sysadmins."
Media seems to focusing a lot of attention on Powerset. But they seem to forget another startup which started innovating in the area of semantic search much before Powerset even arrived on the scene - Hakia. Read the following article which does a decent job of comparing the two startups. http://www.centernetworks.com/powerset-hakia
If you don't succeed at first, try again. If you still don't succeed, try harder. If nothing works, try reality shows.
Seriously, "Factz"?
"Search Tipz" ? Really? OK, goodbye, I'm never going back to that site. Ever.
Guess what, web search is already perfect. We have Google. If you fail to find what you're looking for on Google, it doesn't exist or, more likely, you're a complete clueless fucktard.
Google just indexes words, word fragments, and groups of words.
This is an effort (like many others) to create a semantic web.
This means they are trying to discover the MEANING of words and sentences.
Very edgy, dangerous stuff. The MEANING, once extracted, is expressed in still other words.
So SOMEONE determines what a word or group of words mean.
This leads to classifying, identifying, sorting, drawing relations between ideas, concepts, events, animals, machines, planets, science, art, religion, basically everything you can express with words.
This is what the human brain does. And every human brain does it a little bit differently. It is not the things we perceive that define our world and our place in it. It is the interrelations between things.
I have been involved with several search engines, and the TAXONOMY OF KNOWLEDGE is exactly what is wanted/needed.
Is it possible to create one? Sure.
Is it hard? Yep, really, really, really, really hard.
If you created one would it be correct? NO!
It would only be ONE PERSON's vision of the relationships of knowledge, but NO ONE PERSON can speak for us all.
Now all I have to say (after this rant) to creators of smart search engines is "GOOD LUCK"!!
- I live the greatest adventure anyone could possibly desire. - Tosk the Hunted
Perhaps in the future the "Semantic Web" will be a combination of all these companies. True Knowledge is building a giant inference engine but lacks the sophisticated semantic matching and parsing to fill it from the web. Freebase is building a huge knowledge base and it's awesome, but their knowledge base needs to be used. Powerset has some of the world's best semantic matching and parsing technology, but that needs a massive knowledge base to really understand, and an inference engine to produce unstated knowledge.
All of them seem necessary to move forward on the concept we call "The Semantic Web". Powerset's piece is just the piece that has proved most intractable to computer science thus far, so it's exciting to see progress on that front.
It either returns garbage or "your request can not be understood".
Look up 'Semantic Web', and look into 'triples' and rdfs/owl. Then please, please, please stop pretending to know how this search engine works until you have read them. The weaknesses of this thing are clear, but the potential is there. The problem with semantic search technology is that it is only as strong as the realtionship map(ontology) that is created for it. Triples: 1)Idiot Posted Message 2)Message isOn Slashdot 3)Idiot SubclassOf Users 4)Slashdot has Users 5)(Inferred)Slashdot has Idiots No offense to the brilliant, and very funny, majority.
It does seem to know that Osama Bin Laden is hiding in Waziristan. However, it can't seem to locate the WMD in Iraq.
Too early to say if their search is groundbreaking... Remember, great search algorithms aren't the only component needed for great results. You also need users to tell you which result is ultimately right. Also, Wikipedia is a natural choice to search against. It already has a ton of clean, mostly-standardized and interconnected content. Search would've been a breeze long, long ago if there wasn't any real crap in your indexes.