"Understanding" Search Engine Enters Public Beta

← Back to Stories (view on slashdot.org)

"Understanding" Search Engine Enters Public Beta

Posted by kdawson on Monday May 12, 2008 @02:53PM from the do-what-i-mean dept.

religious freak sends word of the public beta of Powerset, a closely watched San Francisco startup that promises an "understanding engine" to revolutionize Web search. An article in SearchEngineLand points out that Powerset is reaching higher than for mere "natural language." Techcrunch has more details and analysis. For the beta, Powerset makes available all of Wikipedia to search — not all the Web. It's said that their understanding engine required a month to grok Wikipedia's 2.5M articles. The Web is currently at least 8,000 times as large.

33 of 192 comments (clear)

Min score:

Reason:

Sort:

My first search by Anonymous Coward · 2008-05-12 14:54 · Score: 4, Funny

"No results found for naked pictures of Natalie Portman. How does that make you feel?"
1. Re:My first search by i_liek_turtles · 2008-05-12 15:55 · Score: 4, Funny
  
  I don't know, but I did pull up a Southern cooking website!
I'm Unimpressed by eldavojohn · 2008-05-12 14:55 · Score: 5, Interesting

Ok, so I like these new search engine ideas but I am grossly underwhelmed here. I tried the input:
Who is David Bowie? Which it handled quite nicely. Biography, additional links and all that Wikipedia jazz.

But come on, that's a simple question. Let's talk stuff I get into arguments over with my coworkers:
Who played the villain in the first Die Hard? Which at least put Alan Rickman at #8. But let's try mutating that to make it harder but still understood by you and I:
Who played the bad guy in the first Die Hard? Which resulted in very little but drivel with no mention of the great Alan Rickman whatsoever ... although it did put Billie Jean King and Madonna in there for some hilarious reason.

So maybe it can't understand 'bad guy.' Well onto another question:
Who was the organist for The Beatles on Abbey Road? Which resulted in at least the first 20 having no mention of the great & oft forgotten Billy Preston.

So you want to know what the kicker is? I put those same inputs into Google and found the name in the first or second result. Granted PowerSet doesn't do the whole web, I'm pretty sure that if it did, it wouldn't have the pretty results that it gave when I did what one of the articles told me to--ask it when earthquakes hit Tokyo. Just imagine the dates it would come up with if it hit a site with an html table of any seismic activity whatsoever in Tokyo!

I think it's a novel idea to mine Wikipedia for a search engine so long as it isn't just plain old token matching like PowerSet seems to be up to. Be inventive, try a natural language parser written in Prolog that digests all of Wikipedia into a huge network/ontology of concepts ... no matter how flawed it might be.

I find them talking about this in the articles:
Powerset is different. It says that its technology reads and comprehends each word on a page. It looks at each sentence. It understand the words in each sentence and how they related to each other. It works out what that sentence really means, all the facts that are being presented. This means it knows what any page is really about. Yet, I'm not impressed. You can try to personify your software and convince me that Baby Alive really defecates like a human being all over so it feels like I have a real baby. But I know it's just software. You don't have to dumb it down if you're going to blog about it. What is this? A pattern matching implementation? A depth first search tree parsing implementation? An ontology builder? Could you at least drop one of the buzzwords of the natural language parsing field for me here?

So does this story actually have more than a startup looking for a sugar daddy to buy it out?

--
My work here is dung.
1. Re:I'm Unimpressed by bluefoxlucid · 2008-05-12 15:05 · Score: 4, Informative
  
  Use site:en.wikipedia.org to have Google ask all of Wikipedia (English)
  
  --
  Support my political activism on Patreon.
2. Re:I'm Unimpressed by WaltBusterkeys · 2008-05-12 15:10 · Score: 4, Interesting
  
  Yet, I'm not impressed. Powerset is not an instant solution, it's a step in the right direction. Early Google wasn't perfect, but it got a lot better over time as the Pagerank algorithm was refined. Hopefully Powerset will show similar improvement over time.
  
  Heck, if Powerset is just watching what links people click on more often (Google does) then even that can help provide a training set for its algorithm. Using that kind of training set would make it vastly easier to figure out whether a change in the algorithm would be an improvement or not. That's priceless data and I hope they'll use it wisely.
  
  But, really, just remember that this is the first in a new breed of search engines. It won't be the last, by any means:
  
  -Search 0.9 was using the meta and description tags on a page to index (see Altavista). It broke when spammers figured out the algorithms.
  
  -Search 1.0 was using the text of inbound links to index (see Google). It doesn't know what the text means, it just knows that it has a bunch of keywords. It's breaking as people start to game their Google search results.
  
  -Search 2.0 will try to find meaning in the web and understand what a page is really saying (see Powerset).
  
  I don't know yet what Search 3.0 will be, but we're still a long way from getting Search 2.0 to work right. But we're still making progress. Just because Powerset isn't perfect doesn't mean we should give up on the whole venture.
3. Re:I'm Unimpressed by MillionthMonkey · 2008-05-12 15:12 · Score: 4, Funny
  
  I asked it "who won the election in 2004?" and it understood the question, in a way:
  
  The current mayor is Jardir Silva Vidal who won the election in 2004 against Reino Martins de Oliveira
4. Re:I'm Unimpressed by Reality+Master+101 · 2008-05-12 16:13 · Score: 4, Interesting
  
  I don't know yet what Search 3.0 will be, but we're still a long way from getting Search 2.0 to work right. But we're still making progress.
  
  Actually, we aren't making progress -- *at all*. What these guys are trying to do is a subset of artificial intelligence. A subject people have banging their heads against since the 1940s, and we've made *zero* progress since then. We simply don't know how humans process information. We don't even have reasonable theories. We're at the equivalent of the "four elements make up the world" version of physics.
  
  AI researchers always get defensive when I say this, but it's simply true. All we have are better brute-force algorithms that sort-of simulate some of the things that humans do (i.e., voice recognition, character recognition, and other yawner tricks). There is no science of AI. Any sort of human-level understanding of information is far, far away in the future.
  
  --
  Sometimes it's best to just let stupid people be stupid.
5. Re:I'm Unimpressed by WaltBusterkeys · 2008-05-12 16:24 · Score: 4, Interesting
  
  Wait, you're saying that the MIT summer vision project wasn't as easy as people thought?
  
  (Background: In 1966, some MIT computer science faculty thought AI was so easy that computer vision could be solved in one summer worth of work; it probably took 35 years to reach the milestones identified in the research abstract).
6. Re:I'm Unimpressed by Threnody · 2008-05-12 18:04 · Score: 5, Informative
  
  Thanks for testing us out with some real queries -- it's the best way to get the Powerset experience. But, if you only ask NL questions then you don't get to see all of Powerset's features.
  
  Powerset is not token matching. In fact, we read every sentence from every page in Wikipedia that we index. For examples of how we understand syntax, check out queries like "who did texaco acquire" vs. "who acquired texaco". Note that Powerset understands the difference between being acquired by and acquiring, that "buying" is equivalent to "acquiring", and that we are often able to highlight the actual answer to your question. Traditional search engines can do none of these things. Powerset is trying to match the meaning of your query to the meaning of a sentence in Wikipedia.
  
  However, Powerset is very aware that: 1) Users shouldn't be expected to use natural language and 2) We only search Wikipedia and 3) Our algorithms aren't perfect yet. Powerset's release isn't intended to replace your regular keyword search engine. But, we do hope that you come back to Powerset when you have a question that might be answered in Wikipedia.
  
  So, try some topical queries in Powerset, like "kurt godel." In the Factz section, Powerset knows that Kurt Godel proved theorems. If you click on "theorems," you'll see all the sentences in Wikipedia from which we derived that fact (be sure to click on "more"). Note that none of these Factz come from the Kurt Godel page. Powerset's ability to aggregate Factz from across Wikipedia is unique to our technology.
  
  Now try, search for the Presidency of Bill Clinton and click through to the enhanced Wikipedia page (http://www.powerset.com/explore/semhtml/Presidency_of_Bill_Clinton?query=presidency+of+bill+clinton). Note that we also have Factz in the article outline, which helps to summarize long articles. Check out the second term during the Lewinsky affair: the Factz are an amazingly accurate description of the situation.
  
  Sorry to be a bit lengthy, but I wanted to make it clear the Powerset isn't just about asking questions. We've got a video that identifies all of the features: http://vimeo.com/994819
  
  {mark} powerset product manager
  
  --
  Invidia fortunum ovit.
7. Re:I'm Unimpressed by mwvdlee · 2008-05-12 21:18 · Score: 4, Insightful
  
  A search engine with a broader world view than just the US?
  Terrorists!
  
  --
  Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Next step.... by Anonymous Coward · 2008-05-12 15:00 · Score: 5, Funny

Since Powerset can only search Wikipedia, the logical next step is to put the entire web on Wikipedia. Who's up for the job?
1. Re:Next step.... by maglor_83 · 2008-05-12 15:46 · Score: 5, Funny
  
  I think we need to split it up by domain name. I'll start with wikipedia.org
The Web is currently at least 8,000 times as large by nog_lorp · 2008-05-12 15:00 · Score: 5, Funny

Any day now, Wikipedia will surpass The Web's growth rate, and set a course for the day when Wikipedia will be BIGGER THAN THE WEB.
Jargon pisses me off... by KGIII · 2008-05-12 15:03 · Score: 4, Funny

If I hear the word "grok" one more time I'm gonna have to kill someone...

--
"So long and thanks for all the fish."
1. Re:Jargon pisses me off... by east+coast · 2008-05-12 15:06 · Score: 4, Insightful
  
  Would you like some Grok-amole on your taco?
  
  --
  Dedicated Cthulhu Cultist since 4523 BC.
2. Re:Jargon pisses me off... by invader_vim · 2008-05-12 15:31 · Score: 5, Informative
  
  "Grok" isn't jargon. It's a perfectly cromulent word. (Albeit one coined by Heinlein in 'Stranger In A Strange Land'.)
3. Re:Jargon pisses me off... by Molochi · 2008-05-12 16:23 · Score: 4, Funny
  
  Grok.
  
  Grokgrokgrok.
  
  Pics or it didn't happen.
  
  --
  "The Adobe Updater must update itself before it can check for updates. Would you like to update the Adobe Updater now?"
Yawn. Here is something really impressive... by Sanity · 2008-05-12 15:04 · Score: 5, Interesting

True Knowledge actually interprets your question using Natural Language Processing, and then looks through a massive database of user-contributed facts, combining them using sophisticated inference rules, to give you the answer you need. Even the inference rules are user-editable.
2 out of 10 by KNicolson · 2008-05-12 15:24 · Score: 5, Informative

I tried just "Osaka", where I am right now.

First match was an obscure album, then a few "factz" that made no sense.

Let's try again, "What is the largest city in Japan?"

Tokyo doesn't feature at all on the first page! It fairs just as badly with other countries.

It now seems to be slashdotted, so I better quit now.
Obviously still buggy. by ScentCone · 2008-05-12 15:29 · Score: 4, Funny

Your tests are interesting, but you're not really parsing the responses in the right context. They're problematic. Keep in mind this understanding engine understands the world in a way that was hatched out in San Fransisco.

Who is David Bowie? I trust that it came back with, "aka Ziggy Stardust, normal family guy"

Who played the villain in the first Die Hard? Well, obviously, the villain is "capitalism."

Billie Jean King and Madonna ... like I said, it's San Fransisco

Who was the organist for The Beatles on Abbey Road?

You had it at "organ," and it got distracted. What they need is some dev guys from Toledo to collaborate, and provide a little cognitive counterweight to the understanding engine. OK, maybe not Toledo. Maybe Atlanta.

--
Don't disappoint your bird dog. Go to the range.
There is a reason query languages exists. by rindeee · 2008-05-12 15:36 · Score: 4, Insightful

They're faster, more efficient and more accurate. Yes, they require learning yet there's a valid reason and a payoff to doing so. Do we really want to dumb things down any further? If you can't figure out Google, perhaps you should get off the Net.
But it doesn't give results any differently by spoco2 · 2008-05-12 15:47 · Score: 5, Interesting

I asked 'Where do babies come from' and it just gave me back a bunch of articles with that string somewhere in their text.

Pathetic, and you'd hope it's got a long way to go really because at the moment it does NOTHING of merit that I can see.
1. Re:But it doesn't give results any differently by mrbluze · 2008-05-12 16:32 · Score: 4, Funny
  
  I asked 'Where do babies come from' and it just gave me back a bunch of articles with that string somewhere in their text.
  Funny, when I was a boy I asked my father the same thing and he gave me a few articles with pictures of women wearing string. My conclusion: It's amazing what can be done with just a few bits of string.
  
  --
  Do it yourself, because no one else will do it yourself. [beta blockade 10-17 Feb]
2. Re:But it doesn't give results any differently by The+Great+Pretender · 2008-05-12 16:36 · Score: 4, Funny
  
  I tried "What are anal warts". Google gave me a response faster than I could start and stop my timer, with the second answer being Wikipedia. Powerset is still, hold on..., yep still hung up on the question.
  
  --
  A positive attitude may not solve all your problems, but it will annoy enough people to make it worth the effort.
3. Re:But it doesn't give results any differently by glittalogik · 2008-05-12 17:20 · Score: 5, Funny
  
  I asked it a question I got in a trivia contest - what countries have four-letter names? (There are 10, and google's first link is to a list of 'em)
  
  Powerset's first response? "Fuck."
  
  Funny, that was my response too, but at least I got 5 or 6 of them first...
Natural languages are not a help. by EmbeddedJanitor · 2008-05-12 16:00 · Score: 4, Interesting

There is a fallacy that putting a ntaural language on something will make it easy. There are many specialised languages that people use every day.
1 + 1 = 2 is a special notation/langauge that is both more consise and easier than writing "add one and one to make two". So is music score, which is far easier than reading make a high note for a bit then wait a bit and make a low note". Same with C, C++, SQL or Python: the hard bit in programming is algorithm design, not understanding the actual language itself.
Is Natural language really a barrier to entry in using Google? I doubt it. My untechy wife and her friends find everything they need. Plugging natural language into Google gives reasonable results moset of the time.

--
Engineering is the art of compromise.
Yeah right by Reality+Master+101 · 2008-05-12 16:01 · Score: 5, Insightful

What a marketing pile-of-poop. All it does is pull out phrases from Wikipedia; there is no attempt to understand the information at all. When I can type in a yes/no question ("Did they have looms in the 1400s?"), I'll be impressed. When it can make calculation ("How old was columbus when the first colony was founded?"), I'll be impressed. When it can make comparisons ("when did the earth's population match the current population of the united states?"), I'll be impressed.

In other words, when it even attempts to answer a question that isn't already in Wikipedia as a phrase, I'll be impressed.

--
Sometimes it's best to just let stupid people be stupid.
No, early Google was better than anything else. by Bill,+Shooter+of+Bul · 2008-05-12 16:06 · Score: 5, Insightful

Which is why everyone started using it. It wasn't perfect, just better than anything else. Powerset isn't better than lycos.

--
Well.. maybe. Or Maybe not. But Definitely not sort of.
It's about as good as Ask Jeeves. Maybe worse. by Animats · 2008-05-12 16:19 · Score: 4, Interesting

I've been trying various queries, and Google is doing better than Powerset even when I type in some actual question, like "How many Japanese died in WWII?".
Question: "What is the planet closest to the sun?". First answer from Powerset: "Pluto".
I think I see how this works. It takes the question and breaks it at noise words, ("closed class words" in linguistic terminology) constructing a query with both words and phrases. So "What is the planet closest to the sun" becomes "planet closest" sun. In fact, if you rewrite a natural language question in that form and use Google, it does better on question-answering than Powerset does.
Remember Ask Jeeves? It worked like that? No technical breakthrough here, move along.
Oh man. it's down. by redtuxrising · 2008-05-12 16:36 · Score: 4, Funny

Anybody got Google cache for this new search engine?
Re:I wonder how long... by DavidD_CA · 2008-05-12 16:43 · Score: 4, Interesting

I believe you've stumbled upon this startup's business plan.

--
-David
Thoughtpuckey by DynaSoar · 2008-05-12 18:22 · Score: 4, Insightful

The variance in quality of search results is noted elsewhere. I'm more interested in the fallacy of the claim of "understanding". That, as well as its synonym "comprehension" require metacognition, that is, knowing that you know. It is the basis of self-awareness. this program doesn't even pretend to give evidence of this, it simply return search results. Pretending to be self-aware was accomplised by CYC when it claimed to graps the fact that it was a computer program. For anyone interested in seeing the arguments about understanding and self-awareness, see Searle's "Chinese Room" http://en.wikipedia.org/wiki/Chinese_room . As far as I can see, only the hype from the company, including the restatements of same in the referenced articles, make any claims as to "understanding". If there were any evidence of that beyond the hype, I have no doubt those in the field of consciousness studies would tear it apart, if they even bothered to waste their attention on it. If in being bashed it then produced a statement equivalent to "I can feel it, Dave" without being programmed to respond in that way, then I'll give it a look see. Until then it's simply a semantic parser (something already done) attached to a search engine.

--
"I may be synthetic, but I'm not stupid." -- Bishop 341-B
Impressive by mrrudge · 2008-05-12 21:44 · Score: 4, Funny

Q: What the hell is a 'factz'
A: Did you mean 'What the hell is a fact?'

Quite