The Future of Google Search and Natural Language Queries
eldavojohn writes "You might know the name Peter Norvig from the classic big green book, 'AI: A Modern Approach.' He's been working for Google since 2001 as Director of Search Quality. An interview with Norvig at MIT's Technology Review has a few interesting insights into the 'search mindset' at the company. It's kind of surprising that he claims they have no intent to allow natural questions. Instead he posits, 'We think what's important about natural language is the mapping of words onto the concepts that users are looking for. But we don't think it's a big advance to be able to type something as a question as opposed to keywords ... understanding how words go together is important ... That's a natural-language aspect that we're focusing on. Most of what we do is at the word and phrase level; we're not concentrating on the sentence.'"
A phrase is part of a sentence. WP
Everything. All languages are natural. In fact, the spoken word is as good a subject to study evolution and 'survival of the fittest' (to a degree) as any biological organism. The way that different languages and dialects have collided over the years and weeded out words, phrases and structures that work or don't work is one of the most complex and interesting topics around. Despite its quirks the English language is as natural as any creole or foreign language out there, simply evolved differently.
art is science made clear. -cocteau
Not at all. I do that kind of question in Google all the time.
Googling for "Why did World War I start" brings up, as the first result, an article titled "The Causes of World War I".
Followed by a few million more hits if that one isn't good enough.
And the question "What does a duck eat" gets many hits as well. The first one has, in the summary:
Ducks in the wild eat a variety of plants, insects, and native foods that will differ from...
I know it's just picking out keywords from the query and matching them to the sites, not trying to parse the natural language, but it works pretty damn well.
If the masses can keep you down, you're not the Ubermensch.
Most linguists currently believe in the existance of something called "universal grammar", which is a set of properties common to all acquirable human languages (that is, langauges which can be learned as a native language). If you were able to get a computer to comprehend one language (or probably a few to make sure you have sufficiently generalized your principles), then additional languages and dialects would be relatively easy: just give it enough examples of sentences in that language, and the computer will figure out its grammar. Babies can do it. Google should have enough data for that from the web crawls it already does. Keeping up with language evolution is a nearly trivial problem compared to language understanding.
Of course, getting the computer to understand one language is a monumental task.
Centralization breaks the internet.
If you have the opportunity to look at query logs, you see how dumb most search engine queries are.
First, a big fraction of queries are simply navigational. Many are just URLs. The major search providers recognize these in the front end machines and send back canned answers, without even passing them to the real search engine. If you type "myspace" into Google, very little work is expended returning the canned reply.
After that, most queries are one word. Phrase queries are less common.
Few people seem to have noticed, but Google started returning results based on synonyms and homonyms a few weeks ago. There have been some significant algorithm changes recently.
Less than 1% of queries use any operators, like '"" or '-'.
The real problem with natural language queries, though, is that "Ask Jeeves" was a flop. Remember Ask Jeeves? That was a system designed to process queries written as sentences. But it wasn't used that way, and didn't succeed commercially.
Natural language processing is useful when it is well-done. Getting it well-done is the tough part. Don't let Google reps trick you into thinking otherwise just because their R&D in the field isn't where they'd probably like it to be.
Here are some situations where it's useful:
1) interpreting a question rather than just treating it as a "bag of words." For instance, one can type "how tall is Mt. Everest" in the search bar and Google, rather than searching for documents that contain those 5 (or so) tokens will interpret that as a query asking for height and also search for documents that contain "Mt.", "Everest", and "height". Take that a step further and it might look for strings that represent height such as a number followed by "ft" or "meters" or "m".
2) Condensing query chains. Suppose you want to know what sport our 4th president enjoyed playing most. You can ask "what sport did the fourth president of the US like playing?" and the system will give you an answer by first interpreting "fourth president of the US" as Madison, and then searching for what sports Madison enjoyed playing. If not for such interpretation you would either have to run 2 queries (first to find out who the 4th president was, then what sports he liked), or hope that there is a document out there that Google's indexed that contains the words in that initial query.
3) Speech recognition! If you want to run a Q/A session with a computer system that has a speech recognition front end, it is more natural (easier and faster) to ask it "how tall is mt. everest?" than to say "mount everest height" or whatever you would end up typing into Google today. People like to speak using *natural language,* after all. They would gladly do it with computers if the SR systems in them were good enough (some are).
4) More precise query results. What's better, getting back a document that is likely to contain the answer to your query, or getting back the sentence that contains it? Or better yet, getting back the answer and nothing else? The more robust an NLP system the more complicated queries it can interpret and the more elegant its result can be.
On that note, Google actually *does perform* NLP on queries despite what from the summary (I didn't RTFA) looks like claims to the contrary. If you ask Google "how tall is Mt. Everest?" it actually DOES interpret that particular sentence and gives you the answer -- 29000ft or thereabouts. And you only get such an elegant result if you type "how tall is Mt. Everest" (without quotes) or "Mt. Everest how tall". Other queries of this nature will not give you quite as precise a response.
I like basketball!!1!
I think you two are confusing each other..
The parent to my post is talking about the Google Search Box built-into firefox. The GP to my post is talking about the Google search page that has Suggest activated within it. It looks basically like the normal google search page up to the point you start typing-in queries.