"Understanding" Search Engine Enters Public Beta
religious freak sends word of the public beta of Powerset, a closely watched San Francisco startup that promises an "understanding engine" to revolutionize Web search. An article in SearchEngineLand points out that Powerset is reaching higher than for mere "natural language." Techcrunch has more details and analysis. For the beta, Powerset makes available all of Wikipedia to search — not all the Web. It's said that their understanding engine required a month to grok Wikipedia's 2.5M articles. The Web is currently at least 8,000 times as large.
But come on, that's a simple question. Let's talk stuff I get into arguments over with my coworkers: Who played the villain in the first Die Hard? Which at least put Alan Rickman at #8. But let's try mutating that to make it harder but still understood by you and I: Who played the bad guy in the first Die Hard? Which resulted in very little but drivel with no mention of the great Alan Rickman whatsoever
So maybe it can't understand 'bad guy.' Well onto another question: Who was the organist for The Beatles on Abbey Road? Which resulted in at least the first 20 having no mention of the great & oft forgotten Billy Preston.
So you want to know what the kicker is? I put those same inputs into Google and found the name in the first or second result. Granted PowerSet doesn't do the whole web, I'm pretty sure that if it did, it wouldn't have the pretty results that it gave when I did what one of the articles told me to--ask it when earthquakes hit Tokyo. Just imagine the dates it would come up with if it hit a site with an html table of any seismic activity whatsoever in Tokyo!
I think it's a novel idea to mine Wikipedia for a search engine so long as it isn't just plain old token matching like PowerSet seems to be up to. Be inventive, try a natural language parser written in Prolog that digests all of Wikipedia into a huge network/ontology of concepts
I find them talking about this in the articles: Powerset is different. It says that its technology reads and comprehends each word on a page. It looks at each sentence. It understand the words in each sentence and how they related to each other. It works out what that sentence really means, all the facts that are being presented. This means it knows what any page is really about. Yet, I'm not impressed. You can try to personify your software and convince me that Baby Alive really defecates like a human being all over so it feels like I have a real baby. But I know it's just software. You don't have to dumb it down if you're going to blog about it. What is this? A pattern matching implementation? A depth first search tree parsing implementation? An ontology builder? Could you at least drop one of the buzzwords of the natural language parsing field for me here?
So does this story actually have more than a startup looking for a sugar daddy to buy it out?
My work here is dung.
Since Powerset can only search Wikipedia, the logical next step is to put the entire web on Wikipedia. Who's up for the job?
Any day now, Wikipedia will surpass The Web's growth rate, and set a course for the day when Wikipedia will be BIGGER THAN THE WEB.
True Knowledge actually interprets your question using Natural Language Processing, and then looks through a massive database of user-contributed facts, combining them using sophisticated inference rules, to give you the answer you need. Even the inference rules are user-editable.
I tried just "Osaka", where I am right now.
First match was an obscure album, then a few "factz" that made no sense.
Let's try again, "What is the largest city in Japan?"
Tokyo doesn't feature at all on the first page! It fairs just as badly with other countries.
It now seems to be slashdotted, so I better quit now.
"Grok" isn't jargon. It's a perfectly cromulent word. (Albeit one coined by Heinlein in 'Stranger In A Strange Land'.)
I asked 'Where do babies come from' and it just gave me back a bunch of articles with that string somewhere in their text.
Pathetic, and you'd hope it's got a long way to go really because at the moment it does NOTHING of merit that I can see.
What a marketing pile-of-poop. All it does is pull out phrases from Wikipedia; there is no attempt to understand the information at all. When I can type in a yes/no question ("Did they have looms in the 1400s?"), I'll be impressed. When it can make calculation ("How old was columbus when the first colony was founded?"), I'll be impressed. When it can make comparisons ("when did the earth's population match the current population of the united states?"), I'll be impressed.
In other words, when it even attempts to answer a question that isn't already in Wikipedia as a phrase, I'll be impressed.
Sometimes it's best to just let stupid people be stupid.
Which is why everyone started using it. It wasn't perfect, just better than anything else. Powerset isn't better than lycos.
Well.. maybe. Or Maybe not. But Definitely not sort of.