Slashdot Mirror


The Future of Google Search and Natural Language Queries

eldavojohn writes "You might know the name Peter Norvig from the classic big green book, 'AI: A Modern Approach.' He's been working for Google since 2001 as Director of Search Quality. An interview with Norvig at MIT's Technology Review has a few interesting insights into the 'search mindset' at the company. It's kind of surprising that he claims they have no intent to allow natural questions. Instead he posits, 'We think what's important about natural language is the mapping of words onto the concepts that users are looking for. But we don't think it's a big advance to be able to type something as a question as opposed to keywords ... understanding how words go together is important ... That's a natural-language aspect that we're focusing on. Most of what we do is at the word and phrase level; we're not concentrating on the sentence.'"

148 comments

  1. This is awesome. by Arancaytar · · Score: 2, Funny

    "I'm sorry Dave, I'm afraid I can't search that."

  2. natural language is an oxymoron by yagu · · Score: 3, Insightful

    I tend to agree with Norvig's focus on keywords and less emphasis on natural language. Trying to even define a natural language on top of a query engine introduces a layer of complexity probably unnecessary. Natural Language even introduces a level of noise to interfere with accurately (as possible) defining what the user is asking for.

    Google has done a good job, and they get better each iteration figuring out what the user is looking for. I find their suggestion an effective way to not only constrain a query, it actually provides a way to spell check in a pre-emptive way. If you've not used this, install the Firefox Google toolbar, or use the experimental Google "Suggest". Often Google will provide suggestions in the drop down menu that refine your search in ways you hadn't considered that drive to a more direct and accurate representation of your intended query. Of course if their suggestions don't satisfy, you get to continue typing your keywords to your heart's desire.

    (I have to offer an example of suggestion's effectiveness. I often Google to get to the Chicago Tribune (I don't visit there often enough to have created a bookmark, plus it's easy to do this in anyone's browser). Simply typing the first four letters, "chic", I see the first suggestion is "Chicago Tribune". A simple TAB and RETURN, I'm on the Google page with the first link or so my link to the Tribune (with the added bonus of Google's breakout of sublinks).) Your mileage may vary (Google's ranking system may vary the order and options that appear in the drop-down over time), but I find it an amazingly effective research tool (suggestion, not the Trib).

    Natural language is mostly trying to guess intent with structure and key words (as opposed to keywords), but at the end of the day, if you filter out the natural language, and focus on keywords you're going to end up in close to the same place.

    1. Re:natural language is an oxymoron by krog · · Score: 1

      No, I'd say the phrase "natural language" is just about perfect at describing what natural language is.

    2. Re:natural language is an oxymoron by Arancaytar · · Score: 1

      Primitive question words like "what is" or "where is" or "how many" would still be nice to have, though - I agree however that trying to understand a full question is needless overkill.

      "What is" is already mapped to "define:" as far as I know. "Who is" works in a similar way.

      "Why did World War I start" or "what does a duck eat" are questions that require too much understanding and explanation of the concepts. But simple definitions, locations or numbers shouldn't be that difficult to spew out. "how many" could pretty much just map to the calculator, with more constants defined.

      If the search engine merely checks for the presence of such a question word, it can already refine the results. Occasionally, the question is a simple one, but one not usually asked about the term you are looking for. In that case, the refining would speed up the answer by saving you the bother of looking through irrelevant results.

    3. Re:natural language is an oxymoron by ByOhTek · · Score: 1

      Not to mention, by understanding the root conecpts of word structures, without going whole-sentance, eventually you can get good results from sentances, or questions as an emergent behavior.

      I tried several questions in google, and it performed really well, only having trouble with:
          Why does ask suck and google not?

      That came back with a bunch of results saying google sucked. My other questions seemed to produce very useful results. I think that was the point the dev wanted people to understand - if you do your code write, by understanding the simpler parts, sentance processing won't be necessary.

      --
      Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
    4. Re:natural language is an oxymoron by porcupine8 · · Score: 4, Interesting

      I would find the drop-down suggestions a lot more useful if I could read more than the first two words. As I type in, for example "Chicago dog boarding" all I see is a list of "Chicago do... " I'm sure there must be a way to make the search space take up more of the toolbar (I don't really need that much room in the URL space, since most URLs that long are nonsense), but I don't know how and I don't really want my browser window to be the width of my screen.

      --
      Warning: Apple/Nintendo fangirl. Likes her electronics cute & cuddly. May be rabid.
    5. Re:natural language is an oxymoron by yagu · · Score: 1

      If you're using the Google Suggest page, I think the width is sufficient if you have the browser at any reasonable width, so I'm assuming you're talking about the drop down from the toolbar, in which case you're in luck. Type something to invoke the drop down, or click the arrow to look at history. In the lower right, you should see a handle, expand to your heart's content. It's nicely implemented, even pushes the box to the left if you're browser's too close to the right side of your screen. Enjoy.

    6. Re:natural language is an oxymoron by UbuntuDupe · · Score: 1

      Exactly what I was thinking. Search engines, especially Google, are great at picking out the important search terms, even if you do type it as a standard question. So being able to specifically parse natural-language questions seems to have a low reward-to-effort ratio. If you're going to natural language processing, the goal should be to simplify a difficult problem. For example, translating between languages automatically, which takes YEARS of training for an individual to be able to do consistently.

      Saving someone from the HORROR of having to click "did you mean?" or spend five minutes learning how to use a search engine? Don't bother.

    7. Re:natural language is an oxymoron by pluther · · Score: 4, Informative

      "Why did World War I start" or "what does a duck eat" are questions that require too much understanding and explanation of the concepts.

      Not at all. I do that kind of question in Google all the time.

      Googling for "Why did World War I start" brings up, as the first result, an article titled "The Causes of World War I".

      Followed by a few million more hits if that one isn't good enough.

      And the question "What does a duck eat" gets many hits as well. The first one has, in the summary:

      Ducks in the wild eat a variety of plants, insects, and native foods that will differ from...

      I know it's just picking out keywords from the query and matching them to the sites, not trying to parse the natural language, but it works pretty damn well.

      --
      If the masses can keep you down, you're not the Ubermensch.
    8. Re:natural language is an oxymoron by 0100010001010011 · · Score: 3, Funny

      Fine, those were easy. Lets see google understand this one: Women.

    9. Re:natural language is an oxymoron by porcupine8 · · Score: 1
      Hm, no handle for me. I'm in Firefox 2.0 on OS 10.3.9. Maybe this only works in windows FF.

      It would really help if the right half of the drop-down weren't taken up by the word "Sugges..." on the first line, which for some reason also creates a big blank space on all the lines below it. They couldn't just put that as the first line, if they really need to point out that they're suggesting things to me?

      --
      Warning: Apple/Nintendo fangirl. Likes her electronics cute & cuddly. May be rabid.
    10. Re:natural language is an oxymoron by megaditto · · Score: 1

      Searching for how many female redneck scanks with a doctorate degree in nanobiology would date me made me really sad.

      Those women don't know what they are missing.

      --
      Obama likes poor people so much, he wants to make more of them.
    11. Re:natural language is an oxymoron by encoderer · · Score: 1

      Another way to try Google Suggest would be just to install Firefox itself, sans toolbar, and use the browsers Search Box...

    12. Re:natural language is an oxymoron by Malkin · · Score: 1

      I agree. Being a programmer, I think natural language is good for talking to other human beings, and hopelessly inefficient for anything else. Why recite Dickens to a dishwasher, when it has perfectly good knobs and buttons? Why do we constantly suffer under this mad delusion that computers are somehow meant to act like people? Alas, Turing, why did you steer us off this cliff?

    13. Re:natural language is an oxymoron by Sciros · · Score: 1

      It does do some low-level parsing. Google "how tall is Mt. Hood" for example.

      --
      I like basketball!!1!
    14. Re:natural language is an oxymoron by encoderer · · Score: 2, Informative

      I think you two are confusing each other..

      The parent to my post is talking about the Google Search Box built-into firefox. The GP to my post is talking about the Google search page that has Suggest activated within it. It looks basically like the normal google search page up to the point you start typing-in queries.

    15. Re:natural language is an oxymoron by MrMr · · Score: 1

      This came up as #1

      How tall is Mt. Hood? According the U.S. Geological Survey, Mt. Hood is 3426 Meters (11239 Feet) tall. To learn more about Mt. Hood geology visit ...

    16. Re:natural language is an oxymoron by Sciros · · Score: 1

      Eh? That comes up as #2. This is what comes up as #1:

      Mount Hood -- Elevation: 11,249 feet (3,429 meters)

      --
      I like basketball!!1!
    17. Re:natural language is an oxymoron by Anonymous Coward · · Score: 0

      Not sure if this is exactly what you are looking for, but on Firefox you can customize the width of the search box. One way to do it is through the userChrome.css file. Just add the following lines to it:

      #search-container, #searchbar {
      width: 200px !important;
      }

      width can be adjusted to whatever pixel size your eyes desire.

      The file can be found for windows installs at:
      \Documents and Settings\\Application Data\Mozilla\Firefox\Profiles\.default\chrome

    18. Re:natural language is an oxymoron by Intron · · Score: 1

      I think your post explains the difficulties of natural language processing very well.

      --
      Intron: the portion of DNA which expresses nothing useful.
    19. Re:natural language is an oxymoron by Anonymous Coward · · Score: 0

      Try this: how many rivers are in minnesota?

      http://www.google.com/search?hl=en&q=how+many+rivers+are+in+minnesota%3F

      I think there is a big opportunity for the first company that solves this.

    20. Re:natural language is an oxymoron by Intron · · Score: 2, Funny

      You: What larks, eh, Pip?
      Dishwasher: CHANGE TO MODE POTS_AND_PANS
      You: I ent dun nuffink!
      Dishwasher: CANCEL RINSE CYCLE

      --
      Intron: the portion of DNA which expresses nothing useful.
    21. Re:natural language is an oxymoron by yodleboy · · Score: 1

      Fine, those were easy. Lets see google understand this one: Women.
      like anyone on /. would know whether the results were accurate or not. sheeeesh.

    22. Re:natural language is an oxymoron by fbartho · · Score: 1

      There's an extension that makes a resizeable firefox search box for all platforms.

      --
      Gravity Sucks
    23. Re:natural language is an oxymoron by pluther · · Score: 1

      Interesting that the two numbers aren't the same.
      When I was growing up, I'd always heard that it was 11,235 feet tall, which I thought was very cool.

      --
      If the masses can keep you down, you're not the Ubermensch.
    24. Re:natural language is an oxymoron by SL+Baur · · Score: 1

      Simply typing the first four letters, "chic", I see the first suggestion is "Chicago Tribune". A simple TAB and RETURN, I'm on the Google page with the first link or so my link to the Tribune (with the added bonus of Google's breakout of sublinks).) Your mileage may vary (Google's ranking system may vary the order and options that appear in the drop-down over time), but I find it an amazingly effective research tool (suggestion, not the Trib). I find this unlikely in the extreme. When most people start typing at google and reach "chic", Chicago is not exactly what they're looking for. (Or Hot Chicago pizza for that matter).
    25. Re:natural language is an oxymoron by pluther · · Score: 1

      The second result is for the Wikipedia entry that gives an alphabetical list of the 484 rivers in Minnesota.

      The thing is, Google doesn't have to understand the results. It just has to deliver the correct ones. And for that, keyword searching is good enough. In this case, it delivers several results for "many rivers" and "minnesota", but it also gives results for "rivers" and "minnesota", which is what you're actually looking for.

      --
      If the masses can keep you down, you're not the Ubermensch.
    26. Re:natural language is an oxymoron by risk+one · · Score: 1

      And that's exactly why almost the entire field of information retrieval is focused on these 'statistical' approaches instead of some sort of deep semantics. It works. Semantic analysis is very difficult, highly language dependent and slow as hell. For google to do something like that they would have to not only make it work, but make it work for many different languages and make it fast.

      Let's not forget that information retrieval requires highly optimized algorithms. Linear time (over the size of the document collection) isn't enough. You can't search all documents for a single query, you need to retrieve a subset that belongs to a given keyword in (sort of) constant time. So even if you do semantic analysis, you need to either translate your semantic understanding of the query and the document to some sort of keywords anyway, or restrict your deep semantics to a very small subset that you've retrieved using keyword analysis. Once the academics succeed in getting semantics right, then we can start to think about transferring it to the domain of IR. Currently all the interest is in solutions that work.

    27. Re:natural language is an oxymoron by trawg · · Score: 1

      I often do the same, not because I expect Google to be able to magically figure out what I want, but because I figure Google have already indexed a page where someone has asked the exact same question before. I frequently use quotes around it (to search for the whole string), when trying to find something really specific and simple (eg, "how much does the earth weigh"). It's really easy to tweak the question to try alternatives.

    28. Re:natural language is an oxymoron by Anonymous Coward · · Score: 0

      I typed it into google, and hit #3 is titled "Women - Clothing - Shopping.com".

      Next?

    29. Re:natural language is an oxymoron by zopf · · Score: 1
      --
      Did you see the pool? They flipped the bitch!
    30. Re:natural language is an oxymoron by Jugalator · · Score: 1

      For that, the top hit on Google is:

      "The daily destination for women, with horoscopes, health and pregnancy information, message boards and blogs, celebrity gossip, beauty and more."

      I think it's pretty much on the money there too!

      --
      Beware: In C++, your friends can see your privates!
    31. Re:natural language is an oxymoron by Jugalator · · Score: 1

      I know it's just picking out keywords from the query and matching them to the sites, not trying to parse the natural language, but it works pretty damn well. This is because Google uses a popularity-dependent algorithm. It's not popular to ask/answer questions like "What does a duck eat?" where duck was in the meaning of ducking, or something like that. Obviously, a natural language processor should use the same mechanism. There'd only be confusion here if two different meanings of the word competed for the top results (i.e. both being popularly asked), *or* if you searched for an unusual meaning of the word, but in a context that made it look like some other question.

      I don't see how and why "Why did WW1 start" and "What does a duck eat" should give a popularity based algorithm problems. After all, it is fairly common questions in pretty simple context. Try finding info on what James Bond earns according to the mythology as a secret agent though with What does Bond earn? and you'll get some more trouble. It's not a common question to wonder about, but some that may have an answer. But it's far more common to ask what a bond trader earns, so...
      --
      Beware: In C++, your friends can see your privates!
  3. Lojban could help by Besna · · Score: 1

    I wonder if any of these types of translation or recognition engines use Lojban as an intermediary. The unambiguous yet rich grammar of Lojban is ideal for representing different languages. Eventually, it will be used directly.

    1. Re:Lojban could help by CRCulver · · Score: 2, Insightful

      I wonder if any of these types of translation or recognition engines use Lojban as an intermediary.

      No, researchers in this field generally aren't kooks. Mainstream researchers realize that conlangs are not appropriate objects of study.

      Eventually, it will be used directly.

      When even Lojban supporters admit that they have not succeeded in carrying on conversations for much longer than a few minutes in the language, then it doesn't look too likely that the project will take off.

      Furthermore, wasn't the point of Lojban initially to test the Sapir-Whorf hypothesis by teaching a child only Lojban? I'd hope that any parents that embarked upon such a stunt would be prosecuted for child abuse, because you would have to isolate a child from all human society to ensure he doesn't learn the local language.

    2. Re:Lojban could help by Thyamine · · Score: 1

      For anyone else that also has to look it up: http://en.wikipedia.org/wiki/Lojban

      --
      I will shred my adversaries. Pull their eyes out just enough to turn them towards their mewing, mutilated faces. Illyria
  4. The problem with natural language searches... by eln · · Score: 2, Insightful

    The problem with natural language searches is that natural language itself is a moving target. Sure, ten years ago "How do you change the air filter in a Toyota Camry?" would have been a legitimate question to ask a search engine online, but these days it would probably be asked like "lol how do u chng filtr in my pos car? kthxbye :)". I don't know how Google is supposed to keep up with that.

    1. Re:The problem with natural language searches... by Anonymous Coward · · Score: 2, Funny
      u opn hd, tk off top of big rnd blk thng, rmv rnd sqzbx thng, put new sqzbx in, rplc top, cls hd.

      lol easy, looser

      :P

    2. Re:The problem with natural language searches... by Anonymous Coward · · Score: 0

      My eyes are bleeding...

    3. Re:The problem with natural language searches... by AutopsyReport · · Score: 1

      Welcome to Mikita's. How may I serve you?
      - I'd like 'rullers, 'ugar, 'ucks and a Mikita 'cup... And then I think I would like a large... ...with 'eam.
      - And could I please have 'elly donut and... ...raspberry and a 'nge drink?

      What?
      - I'm sorry. And 'eaker 'oken.

      Let me recap the order: A cruller, two sugar pucks, a large coffee with cream, a raspberry jelly doughnut, orange drink, a box of five-holes.

      - Yeah.
      Thank you. Drive around, please.

      --

      For he today that sheds his blood with me shall be my brother.

    4. Re:The problem with natural language searches... by AnyoneEB · · Score: 1, Informative

      Most linguists currently believe in the existance of something called "universal grammar", which is a set of properties common to all acquirable human languages (that is, langauges which can be learned as a native language). If you were able to get a computer to comprehend one language (or probably a few to make sure you have sufficiently generalized your principles), then additional languages and dialects would be relatively easy: just give it enough examples of sentences in that language, and the computer will figure out its grammar. Babies can do it. Google should have enough data for that from the web crawls it already does. Keeping up with language evolution is a nearly trivial problem compared to language understanding.

      Of course, getting the computer to understand one language is a monumental task.

      --
      Centralization breaks the internet.
    5. Re:The problem with natural language searches... by vertinox · · Score: 3, Insightful

      Most linguists currently believe in the existance of something called "universal grammar", which is a set of properties common to all acquirable human languages (that is, langauges which can be learned as a native language).

      The argument against universal grammar is of course is non-Latin languages like Japanese (and possibly Russian) which don't play by the rules. I'm not really a language expert on either, but I'm tried to learn Japanese and its really tough.

      Everything is relationship based off the speaker and to the person or object he is talking about and then the audience. As in... If I'm talking about a pencil sitting on my desk, it has a different tense than a pencil on your desk and then a difference tense in someone else's hand or a pencil that is sitting at a far off place (-sara or -kara? I can't remember). And we haven't even gotten to issues about ownership like if it was in my hand or your hand.

      Whereas in Latin based languages it is more concerned about action or tense of ownership but not relationship to the speaker or audience. Hence... It is argued universal grammar does not apply in that respect.

      --
      "I am the king of the Romans, and am superior to rules of grammar!"
      -Sigismund, Holy Roman Emperor (1368-1437)
    6. Re:The problem with natural language searches... by kalidasa · · Score: 1

      It's more a matter of principles and parameters that every language chooses from: there's a universal set of principles and a universal set of parameters, and every language is built up on a general structure developed from a combination of several of the first set plus several of the second set. One could imagine that a new language invented ex nihilo would begin by assign signs (usually phonemes) to signifieds (objects in the real world, concepts, processes, etc.) and relate them to one another by means of these principles and parameters in ways that would generate a system of morphology and syntax. This kind of structure holds for all languages, no matter how far they are from e.g. the IE languages (by the way, English is NOT a latinate language - it is a Germanic language that has acquired a great deal of secondary latinate vocabulary). The problem is working out what all the principles and parameters are (we only know some of them) and seeing how they are instantiated in various natural languages.

    7. Re:The problem with natural language searches... by OWJones · · Score: 2, Funny

      I'm not really a language expert on either, but I'm tried to learn Japanese and its really tough.

      Perhaps you should try and nail down English first. :)

      Cheers,
      -jdm

    8. Re:The problem with natural language searches... by Chapter80 · · Score: 1
      Behind every sentence is an idea (or several). And an idea can be parsed and stored unambiguously. (Allow me to remove the ambiguity in the previous sentence... there can be a machine readable representation of every idea, in which the representation is unambiguous. Even ambiguous ideas can have an unambiguous representation.)

      Check out the language Lojban for just one way to do this.

    9. Re:The problem with natural language searches... by rsborg · · Score: 1

      The argument against universal grammar is of course is non-Latin languages like Japanese (and possibly Russian) which don't play by the rules. I'm not really a language expert on either, but I'm tried to learn Japanese and its really tough.
      You do realize that universal grammar allows different languages to have different rules, right? The universality is about how nouns and verbs exist in sentences, and relate to each other, etc. Even in French vs. English, genderization and possession are wildly different, but the basic universal structure is the same as English (Example: talking about my wife's father, I would say, effectively "his father" which is wrong in English... it's "her father"... this leads to some very weird sounding things between the two language speakers).

      Universality does NOT imply that all the grammar rules will be the same at all.

      --
      Make sure everyone's vote counts: Verified Voting
    10. Re:The problem with natural language searches... by Krelnor · · Score: 1
      Your point is slightly incorrect, and you have technical errors as well.

      First, there could in fact be a universal human grammar which supports both SOV, OSV and various other languages. The existence of variety in language grammar and semantics does not eliminate this possibility, rather it simply excludes very simple solutions.

      Second, Japanese has pronouns used according to the spatial or degree of familiarity between the topic, the speaker and the listener (the a/so/ko word group). However, they'd only be ambiguous in the same contexts that 'that' would be ambiguious in English. "I want to find out more about that".

      • kotira wa pen desu
      • sotira wa pen desu
      • asotira wa pen desu
      conveys slightly different information than 'this is a pen' or 'that is a pen'. Translating from English to Japanese and having to work out what the relationship is to get correct word choice means that English is more ambiguous than Japanese in that context.

      Isolating Japanese language samples out of the sample's context becomes ambiguous because redundant information is omitted. Japanese can do this because the redundant information is not required as a structural element of the sentence, unlike languages such as English. So, a Japanese natural language processor simply has to process slightly more context, but it is by no means special in this manner.

    11. Re:The problem with natural language searches... by AnyoneEB · · Score: 1

      As the other people who have already replied to you have mentioned, the differences you specify between Japanese and langauges more familiar to you have nothing to do with it not following universal grammar, they are simply ways Japanese differs from the other languages you have encountered. There actually are aspects of Japanese, which also appear in other languages, which linguists have trouble explaining (classifiers and double-nominative verbs likely among other features I am not familiar with), but that is true of all natural languages because linguistics in its current form is a relatively young science.

      In fact, Japanese is a rather good example of a clean head-final language. Russian (like Latin) gives X-bar theory a bit more trouble because it has freer word order, but there are theories which explain that and better ones are being developed and discussed in the literature.

      The distinction you mention in Japanese between "this pencil" (kono), "that pencil" (sono), and "that pencil over there" (ano) is not unheard of in other languages, although English only has one type of "that" as opposed to Japanese's two.

      I wish I could point you to some good linguistics sites, but I have not found any. My knowledge is from an undergrad course which unfortunately does not have any online notes. The textbook has a listing on Google Books, but I assume not all of it is available and the professor had a tendency to disagree with it.

      --
      Centralization breaks the internet.
    12. Re:The problem with natural language searches... by Anonymous Coward · · Score: 0

      "I'm not really a language expert on either, but I'm tried to learn Japanese and its really tough."
      This is the only correct sentence in your post, because it demonstrates how little you know about language use. Please learn some Japanese before commenting about how it applies to the existence of Universal Grammar. There are some languages which arguably cause problems for proponents of Universal Grammar (Piraha, for example), but Japanese is not one of them. It easily fits into the theory of UG, but many of the "parameters" (if your theory of UG includes parameters) are set differently from English, which is part of why it's so hard to learn for English speakers.

      So far as I can guess, your pencil example is talking about the difference between "kono"(adj)/"kore"(noun) ("this", which is near me), "sono"(adj)/"sore"(noun) ("that", which is near you), and "ano"(adj)/"are"(noun) ("that", which is not near either of us). Pencils do not have tense - verbs do. Japanese does have relatively complex levels of politeness based on speaker/audience, but that's just vocabulary, and certainly English has different words with different amounts of formality, though it's more subtle. It's not an issue of grammar.

    13. Re:The problem with natural language searches... by Anonymous Coward · · Score: 0

      This post is complete nonsense. First of all, linguists would have to be extraordinarily dim to try to deduce linguistic universals based strictly on "Latin" languages. They aren't. Claims about linguistic universals are based on data from thousands of different languages. (By the way, English is one of those non-"Latin" languages you seem to think are problematic.)

      Your claim about different "tenses" in Japanese makes so little sense I don't even know where to begin--it's hard to say whether you actually have an idea you're trying to express and just have no idea what the word "tense" actually means, or if you're just babbling. If you're talking about the "ano/kono/sono" types of distinctions--expressing different levels proximity to the speaker--this (1) has absolutely nothing to do with tense, and (2) is expressed in English and every other European language I can think of. (e.g. This dog vs. That dog).

    14. Re:The problem with natural language searches... by Anonymous Coward · · Score: 0

      Perhaps you should try to nail English first.

  5. phrase/sentence? by Scrameustache · · Score: 1

    Most of what we do is at the word and phrase level; we're not concentrating on the sentence. What's the difference between a phrase and a sentence?
    --

    You can't take the sky from me...

    1. Re:phrase/sentence? by harmonica · · Score: 4, Informative

      A phrase is part of a sentence. WP

    2. Re:phrase/sentence? by admactanium · · Score: 1

      What's the difference between a phrase and a sentence?
      i'd assume it's something like the difference between "how do i set up my d-link router?" and "d-link router set up". i believe google already parses out "natural language" queries about as well as any other search engine, including ask jeeves, which was supposed to use natural language as its unique selling proposition. google does give different results for both queries but both sets of results seem to be relevant.

      i'm more curious about how the use of keywords in google searches will affect "natural language" as we move forward. it used to be necessary to form coherent sentences to gather information and now it's rather the opposite. i think the generation of kids growing up now probably tend to think in keywords first. we already see tech-savvy people substituting tech phrases for real world phrases. what happens when a vast majority of kids growing up have access to technology and the internet?
    3. Re:phrase/sentence? by Anonymous Coward · · Score: 0

      Phrases are a subpart of sentences, a group of words which form a single unit within the sentence from a syntax point of view.
      "There is a phrase in this sentence" is a sentence - within that "in this sentence" for instance is a phrase.

    4. Re:phrase/sentence? by teslar · · Score: 1

      i'd assume it's something like
      Not to be pedantic, but why assume when Wikipedia is just a click away and can show you that your assumption is wrong?
    5. Re:phrase/sentence? by iabervon · · Score: 1

      The most significant thing in their case is that the phrases they deal with don't have verbs (and the associated syntactic function words). They're talking about noun phrases, which go from just after a word like "the" to the next word that's nested less deeply than "the". For example, "most significant thing" or "associated syntactic function words". You know, like the things you might type into a Google search...

    6. Re:phrase/sentence? by admactanium · · Score: 1

      Not to be pedantic, but why assume when Wikipedia is just a click away and can show you that your assumption is wrong?
      what about that link makes my assumption wrong? "d-link router set up" seems to be a noun phrase. set up is used as a noun quite often.
    7. Re:phrase/sentence? by teslar · · Score: 1

      what about that link makes my assumption wrong?
      To be really pedantic, the lack of "the" or "a" ;) An article would be required to turn your collection of words into something that is a single unit within a sentence, especially since you're using the singular. As you presented it, your assumption seemed to be "a phrase is a collection of words that is not a proper sentence" and that is wrong, at least from a linguist's perspective.
  6. somebody tell me by rossdee · · Score: 1

    What is 'natural' about the English language?

    1. Re:somebody tell me by lstellar · · Score: 2, Informative

      Everything. All languages are natural. In fact, the spoken word is as good a subject to study evolution and 'survival of the fittest' (to a degree) as any biological organism. The way that different languages and dialects have collided over the years and weeded out words, phrases and structures that work or don't work is one of the most complex and interesting topics around. Despite its quirks the English language is as natural as any creole or foreign language out there, simply evolved differently.

      --
      art is science made clear. -cocteau
    2. Re:somebody tell me by the-matt-mobile · · Score: 1

      I hope you're just being rhetorical, because giving it just a bit of thought would have provided the answer. Language is a seeminlgly advanced skill, yet most humans pick it up as toddlers. If you've never had a 3-year old, let me tell you they can express some pretty complicated thought processes verbally. And it doesn't matter what language it is - the vast majority of toddlers communicate and comprehend others' communications - all when they still have not mastered so many other 'simple' things. Language is very natural - even simpler life forms than humans communicate with one another.

    3. Re:somebody tell me by lstellar · · Score: 1

      Communication != language. Language is far more complex and deep, containing tenses and vocabulary. Humans are the only animal that truly uses language -humans invented language- other animals simply interact. Dolphins clicking at eachother are not speaking with language, they are signaling with rudimentary noises.
      http://www.dictionary.com/ and Google: 'Linguistics.'

      --
      art is science made clear. -cocteau
    4. Re:somebody tell me by orcrist · · Score: 1

      Communication != language. Language is far more complex and deep, containing tenses and vocabulary. Humans are the only animal that truly uses language -humans invented language- other animals simply interact. Dolphins clicking at eachother are not speaking with language, they are signaling with rudimentary noises.

      You got everything right except for the part I highlighted. Humans "invented" language like we invented walking -- namely not at all. Language evolved.
      --
      San Francisco values: compassion, tolerance, respect, intelligence
  7. Google and Asimov's fictional Multivac by dpbsmith · · Score: 3, Interesting

    Isaac Asimov's fictional Multivac was a huge computer with some near-universal knowledge database that answered natural-language questions, giving Asimov all sorts of opportunities to present philosophical conundrums as entertaining short stories.

    In the 1960s and thereabouts, when I used to hack around on minicomputers, but personal computers weren't well known to the general public, I always found it difficult to explain what computers did. One of their commonest questions was "Well, how does it work, do you type in questions and does it answer them?" Programming in assembly language didn't really fit that description.

    Many technological fantasies seem to remain surprisingly distance. I tried ViaVoice and gave up: it's not a "voice typewriter." Roomba is not a general-purpose housekeeping humanoid-form robot, and neither are the machines that weld automobile chassis.

    However, it seems to me that Google is within striking distance of Asimov's "Multivac" fantasy.

    Incidentally, if you type in queries as complete sentences Google seems to do any worse than if you don't. Sort of the converse of adventure games, where one begins by typing "Walk over to the table on the left and pick up the silver key with your left hand" and quickly learns to use telegraphic style: "Go table. Take key."

    1. Re:Google and Asimov's fictional Multivac by Arancaytar · · Score: 1

      I tried ViaVoice about eight years ago and I heard that signal recognition - both OCR and VR - have come a long way since then. Haven't tried it though, and I don't know how good ViaVoice is now.

      Also, "What is the sine of half pi and a half times the cosine of one quarter plus the answer to life, the universe, and everything?" works correctly. I found this pretty awesome.

    2. Re:Google and Asimov's fictional Multivac by vertinox · · Score: 1

      Google comes close with simple "what" questions like:

      What is one plus one? will give you 2.

      What is the speed of light? will give you 299 792 458 m / s.

      And maybe even something like...

      What is the Capital of Sweden? will give you Stockholm.

      Will give you the answer at the top of the screen.

      Of course if you type

      What is the reason for Napoleon's 1812 defeat?

      It will give you the 1812 overture as the first hit so it has a bit more to go on context.

      --
      "I am the king of the Romans, and am superior to rules of grammar!"
      -Sigismund, Holy Roman Emperor (1368-1437)
    3. Re:Google and Asimov's fictional Multivac by SL+Baur · · Score: 1

      Another example of questions that work well is "What does mean?"

      The search engines have always been good at playing Jeopardy. Typing the exact text of error messages tends to always lead to "What caused this?" and "How do I fix this?".

    4. Re:Google and Asimov's fictional Multivac by Arancaytar · · Score: 1

      What is the reason for Napoleon's 1812 defeat?

      It will give you the 1812 overture as the first hit so it has a bit more to go on context.


      That, or Google believes it is on to a little known historical secret, related to the role of music in warfare. :P
    5. Re:Google and Asimov's fictional Multivac by Arancaytar · · Score: 1

      (No wait: Actually, it just has the causal relationship reversed. After all, Napoleon's defeat was directly responsible for the 1812 overture!)

  8. Great advance by PolarBearFire · · Score: 1

    It would actually be a great advance, but the resources required would not offset its advantages since 99% of the time you can find what you're looking for using keywords and phrases.

  9. You Usually Can Now by Greyfox · · Score: 1

    I tell new users that they should just ask Google a question in plain english. That gives the a more natural context in which to embed their keywords. I know Google is just picking up on the keywords and ignoring the filler words, but it usually gets the correct results and it's a lot easier for people who are just starting out on the Internet.

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

  10. Oh, for an "edit" button... by dpbsmith · · Score: 1

    I meant that "if you type in queries as complete sentences, Google doesn't to do any worse than if you don't." That is, even though it's not an advertised feature, you can use natural language with Google if you like. It just doesn't help you; you might just as well use truncated phrases.

  11. Stop Gaming the System by Anonymous Coward · · Score: 0

    I suggest they focus their efforts on preventing websites from gaming the system.

    How many time shave you entered a search term that is a company's name, expecting to see that company's link on the first page only to be shown a bunch of links to dumb ass search sites that have gamed the google search engine?

  12. this is also why by circletimessquare · · Score: 5, Insightful

    text-to-speech or speech-to-text is also useless (unless your blind/ deaf/ driving a car)

    the idea of interacting with a computer like a human is an artificial hangover from being introduced to the computer the first time. after using it for awhile, you realize that ineracting with a computer, in small limited ways, like searching information, is easier NOT using natural language

    for the very simple reason that it takes more thought, and more typing to interact naturally. it is easier to train a human to interact with a computer than it is to train a computer to interact with a human. and for the human, it is more rewarding, because the human realizes he doesn't need to exert so much effort

    "what is the capital of france?"

    versus

    "france capital"

    if you were to shout "france capital" at someone, it would be rude and confusing. but for a computer, it's actually superior

    it is the conservation of communication effort at work here that wins out over natural language in computer interaction

    --
    intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
    1. Re:this is also why by garcia · · Score: 1

      if you were to shout "france capital" at someone, it would be rude and confusing. but for a computer, it's actually superior

      I know what you're trying to get at but that example wasn't exactly a good one. The search engine could simply strip all the words that are pointless (is, the, and of). I'm sure that if it accepted natural search words like "what" that would automatically be eliminated too.

      My biggest question is how many searches come from people in a natural way? Since Sunday only two have landed at my site out of 12,206 searches across the various engines:

      1. What does Ba-Tampte mean (yahoo)

      2. What type of mushrooms to put on pizza (google)

      If I'm at such a low percentage for natural language searching, I can only imagine that it's even less for the whole lot. Why bother to fix something that isn't broken?

    2. Re:this is also why by Anonymous Coward · · Score: 1, Insightful

      if you were to shout "france capital" at someone, it would be rude and confusing. but for a computer, it's actually superior Not really superior, but sufficient for the systems of today.

      The next evolution in computer search is understanding what documents really satisfy "what is the capital of france" versus returning anything with "france" and "capital" in its text such as "France should always be spelled with a capital letter". Google doesn't attempt to differentiate and they leave you to filter the results manually to find what you want.

      The reason for natural language interfaces is not simply to collect the keywords, it is to understand the context within which you want results and filter out meaningless results. Google uses a pagerank that should bubble the more common meanings of the keywords to the top. But I still find myself having to filter out tons of irrelevant results to get to a very specific results that is 4 or 5 pages down. So I then have to learn to think like a computer and add other keyword context that differentiates the result I want. Like "france capital -letter -capitalization +city" and inevitably I end up filtering out results that fit my context, but happen to have terms that I filtered on.

      So searching on meaning is still a holy grail. And in fact, I'm surprised this guy from Google said this when another Google engineer at Ideafest stated very matter of factly that Google's future was in natural language 'star trek' like computer systems. This is completely contrary to what is being said here.

    3. Re:this is also why by Sciros · · Score: 1

      This is only true for basic queries where interpreting the queries as bags of words would suffice.

      Besides, for communication via speech it's completely unnatural to say "france capital" to a machine as opposed to "what is the capital of France," even. So for speech recognition systems NLP really helps out.

      --
      I like basketball!!1!
    4. Re:this is also why by xant · · Score: 1

      I don't think t2s or s2t are useless at all; just useless for controlling a computer. Not all the time, but in some situations I would very much like to be able to dictate an email into my phone, or call something that reads my email to me. And then, being able to log my conversation with another human being to text, which then gets emailed me, is a righteously good app.

      These aren't the holy grail technologies they were once hailed to be, for sure. But they have some very important niches.

      --
      It's rare that you're presented with a knob whose only two positions are Make History and Flee Your Glorious Destiny.
    5. Re:this is also why by Prof.Phreak · · Score: 1

      Well, the long term idea is that computers will be able to ``understand'' what you mean, kind of like humans can understand what you mean most of the time. We currently don't "see it" because current generation of speech-to-text (and vice versa) just sucks. It's phrased based, not semantics based.

      Do you see people still using keyboards 200 years from now? How about 50? If not... then -something- has got to replace'em... Also notice the lack of keyboards on startrek :-)

      --

      "If anything can go wrong, it will." - Murphy

    6. Re:this is also why by tehcyder · · Score: 1

      text-to-speech or speech-to-text is also useless (unless your blind/ deaf/ driving a car)
      I'm not sure you should be driving a car at all if you're blind.
      --
      To have a right to do a thing is not at all the same as to be right in doing it
  13. Real questions ... by foobsr · · Score: 4, Interesting

    Typing "What is the capital of France?" won't get you better results than typing "capital of France." ... Most of what we do is at the word and phrase level; we're not concentrating on the sentence. We think it's important to get the right results rather than change the interface.

    This misses situations like searching for "That sf-short-story were the crew of the visiting spaceship is given a dog as a present" in which googling failed, at least for me, or, more technically, when you have absolutely no idea about what the relevant terms within the outcome might be. In short, if you have a real question.

    CC.

    --
    TaijiQuan (Huang, 5 loosenings)
    1. Re:Real questions ... by Blakey+Rat · · Score: 1

      Yeah, recently I had that problem with "that one Bradbury story where the spacemen landed on Venus and it rained all the time and half of them went insane, also there were natives hunting them I think."

    2. Re:Real questions ... by zombie_striptease · · Score: 1

      Some quick Googling lead me to The Long Rain, though I only got to that by remembering how the native venusians liked to take their time torturing and drowning visitors and having those keywords to add.

    3. Re:Real questions ... by zombie_striptease · · Score: 1

      My point being that a natural language search may not be enough to answer such queries, you'd pretty much need a massively advanced AI that could synthesize knowledge to understand your summaries and pick through which content matches.

    4. Re:Real questions ... by Agripa · · Score: 1

      That sf-short-story were the crew of the visiting spaceship is given a dog as a present

      Just make sure not to report that the offog came apart under gravitational stress. That would really upset headquarters.

      Wow. There is ONE Google reference to that now and perhaps there will soon be two.

  14. Users have changed, too by harmonica · · Score: 2, Insightful

    These days, hardly any user enters queries in the form of natural language questions, judging from log files. That was different a couple of years ago.

    Just like "Click here to do X" isn't used as much on Web pages anymore. People now tend to know that they can click on underlined text to find out more.

    1. Re:Users have changed, too by cowscows · · Score: 1

      Exactly. If there was ever a real need to be able to do natural language searches, that time is basically over. People have been learning how to search the internet effectively, it's not really that hard to do it successfully. As the general populace gets more and more computer and internet savvy, a lot of this sort of thing becomes almost intuitive.

      --

      One time I threw a brick at a duck.

  15. Natural Language from a linguistics... by lstellar · · Score: 1

    Natural Language from a linguistics perspective incorporate into a search engine will be truely innovative technology. After reading the article and his wording, it seems clear that it isn't so much that pursuing search via natural language is fruitless, but that it is borderline unattainable at the moment. Using keywords allows to the person performing the query to filter their own natural thought.

    "Hm, I wonder how many moons Saturn has? I will Google 'Saturn+Moons.'"

    This method is by far the most effective and least time consuming today, but the day we are able to think what we want and then search for what we want with no filtration necessary will coincide with the advent of true artificial intelligence. Linguistics (and thus, 'Natural Language') is one of the most complex studies in the world. The creation, evolution and implementation of different dialects within any given lexicon are very difficult to understand, let alone across different languages. 'Natural Language' search will be impossible to truly implement until we fully understand the way we communicate to one another. Simply extracting words or operators, clearly as we know, simply doesn't work. It is the complex relationship that matters. But once we figure that out- and we will- we will be at the next great step forward.

    --
    art is science made clear. -cocteau
    1. Re:Natural Language from a linguistics... by foobsr · · Score: 1

      But once we figure that out- and we will- we will be at the next great step forward.

      Might take some time though. It seems that the arguments brought forward by TAUBE (1961 !, Computers and Common Sense, the Myth of Thinking Machines) against the feasibility of machine translation still hold and apply to the problem in focus.

      CC.

      --
      TaijiQuan (Huang, 5 loosenings)
  16. Phrase level? by RecoveredMarketroid · · Score: 1

    Most of what we do is at the word and phrase level; we're not concentrating on the sentence.
    No wonder AI:A Modern Approach is such a tough read!

  17. What's really the story by Anonymous Coward · · Score: 0

    Is that natural language stuff is hard. And even more so, AI, which was so promising to so many of us in the 80s turned out to be so hard that it is basically impossible. I think it caused a real shift in the natural language research, sending us to use statistics and probability, since basically AI never got going,

    1. Re:What's really the story by theStorminMormon · · Score: 3, Insightful

      I think that actually misses the point. If you've worked as an engineer or a consultant - or even if you've just helped people search for stuff on Google - you probably have realized that THEY DON'T KNOW WHAT TO ASK FOR. A really good consultant/engineer is someone who has the ability to figure out what a person wants based on what they say.

      Even if you mastered natural language (and I'm not saying that's a surmountable task) I think people would be shocked to see that Google searches would still be frustrating.

      I'm not just saying "blame the user", I'm saying that language itself is not even the last obstacle to overcome. You're going to need to figure out an program that not only understands natural language, but also context, culture, etc.

      Getting an AI of near-human intelligence is not enough, because to be really good at getting people the answers to questions they can't ask you have to be of above-average capability.

      --
      The Southern Baptist Convention has creationism. On Slashdot, we have porn.
    2. Re:What's really the story by Alt_Cognito · · Score: 3, Interesting

      Bah, the engine just has to ask refinement questions. Of course, this could be interesting:

      User: Who is the winningest coach in football?
      Search Engine: Did you mean, What coach has the most wins in football?
      User: Yes
      Search Engine: Did you mean American football?
      User: Yes
      Search Engine: NFL NCAA CFL...?
      User: Umn, all of the above
      Search Engine: Are you sure?
      User: What?
      Search Engine: Are you sure you want to compare all years, after all, NFL rules significantly changed in 2001, and leagues are not comparable...
      User: Yes.. Yes, please compare them all....
      Search Engine: You know winningest isn't a word right? .... And so on and so forth...

    3. Re:What's really the story by Anonymous Coward · · Score: 3, Funny

      User: NAKED WOMEN!
      Search Engine: Would you prefer woUser: NOW!!!
      Search Engine: *sigh* As you wish...

    4. Re:What's really the story by vegiVamp · · Score: 1

      > you have to be of above-average capability.

      Which, per definition, is approximately half of the population ? I'm sorry, but being 'above average' is nothing exceptional.

      --
      What a depressingly stupid machine.
  18. what is google, freakin' jeeves? by crazybilly · · Score: 2, Insightful

    do people really type questions into search boxes? that always stumped me about the ask jeeves thing....who the crap really ASKED anything. I thought you just googled what you wanted to know about (or nowadays, hit the wikipedia page for it for starters).

    Maybe I'm just not up on my search engine technology (or, rather, I don't know anything about it). I just don't know anybody who'd think to put a regular question into google.

    1. Re:what is google, freakin' jeeves? by dcroxton · · Score: 1

      I always use keywords, since I know that's all it's using anyway. That's why it drives me nuts on Microsoft Office when I type in some keywords and it asks for a complete sentence. Why?? I want to ask. Does it have some super-advanced way to figure out what I mean, or is it just going to pick out the keywords anyway?

      --
      Sincerely, Derek

      A curious little blog
  19. The capital of France by CruddyBuddy · · Score: 5, Funny
    Paris Hilton says:

    "That's easy! The capital of France is 'F'."

    --
    ----------
    Any problem can be made unsolvable if there are enough meetings made to discuss it.
    1. Re:The capital of France by Anonymous Coward · · Score: 0

      That's funny. I actually shouted it at someone and he answered Hilton

  20. Esperanto by Besna · · Score: 1

    Esperanto, for one, makes a perfect study for researchers. Your brush is too broad. Your cynicism and jadedness are disappointing.

    1. Re:Esperanto by orcrist · · Score: 1

      I'll refer you to my previous rant on this subject.

      Fucking amateur Linguists everywhere

      --
      San Francisco values: compassion, tolerance, respect, intelligence
  21. How to Kill Google: by Ralph+Spoilsport · · Score: 1
    Develop a natural language search engine that provides results that are as effective as Google's.

    I wonder if MS or Yahoo are listening...

    RS

    --
    Shoes for Industry. Shoes for the Dead.
  22. How search is really used by Animats · · Score: 2, Informative

    If you have the opportunity to look at query logs, you see how dumb most search engine queries are.

    First, a big fraction of queries are simply navigational. Many are just URLs. The major search providers recognize these in the front end machines and send back canned answers, without even passing them to the real search engine. If you type "myspace" into Google, very little work is expended returning the canned reply.

    After that, most queries are one word. Phrase queries are less common.

    Few people seem to have noticed, but Google started returning results based on synonyms and homonyms a few weeks ago. There have been some significant algorithm changes recently.

    Less than 1% of queries use any operators, like '"" or '-'.

    The real problem with natural language queries, though, is that "Ask Jeeves" was a flop. Remember Ask Jeeves? That was a system designed to process queries written as sentences. But it wasn't used that way, and didn't succeed commercially.

    1. Re:How search is really used by Alomex · · Score: 1

      Few people seem to have noticed, but Google started returning results based on synonyms and homonyms a few weeks ago. There have been some significant algorithm changes recently.

      I've noticed because the quality of the results went down noticeably.

  23. He's lying by helicologic · · Score: 4, Insightful

    I think Norvig's lying. Google may not be pursuing linguistic structure above the phrase level in searches, but I'd bet a donut they're working their asses off trying to analyze crawled docs linguistically. To get relevance, they need to extract what a document is about. That implies sentence-level syntax analysis, which is input to sentence-level semantics, which is input to paragraph-level semantics, which is input to "pragmatic" analysis. I think what he's not saying is that the place the linguistic research dollars are going is elsewhere than parsing "Where is Paris?"

    1. Re:He's lying by clodney · · Score: 1

      Even with Google's computing resources, I think attempting to do natural language analysis of the entire Internet would be a daunting proposition.

      Even though the number of queries processed every day is immense, the amount of text to analyze pales in comparison to the amount of text on the pages they crawl every day.

      Of course, they could prune their search set considerably if they just assumed that there is no semantic content in most MySpace pages and blog entries.

    2. Re:He's lying by Anonymous Coward · · Score: 0

      "Lying" is a bit harsh. He isn't lying, but you are not wrong.

      Watch some Google Tech Talk videos. I no longer remember which one it was (some sort of "here's how we do things" discussion of how Google works...) but a Google guy commented that people quickly adapt to using a search engine: a first-time user might type "Where can I get a rebuilt engine for a 1976 Pacer?" but after less than a month of using Google that same user would just type "rebuilt engine 1976 Pacer". So Google concluded that attempts to parse natural language are not interesting. However, that same speaker said that of course they are always working to give better results; he had an example where a UC Berkeley cooking class was called "Thai Food 101" (and the page did not contain the words "Berkeley" or "cooking class"), and he showed how that page can be returned as a result for search keywords "cooking class Berkeley". They are building data structures to try to group pages by concepts.

  24. Where Are We? by ImYY4U · · Score: 1

    Answer: This. OK, programming joke aside, seriously...natural language should not be incorporated into search engines. What about generic questions, such as my subject line? What would Google return? What SHOULD Google return to that? Do a tracert on the user's IP, and answer with a map? Seriously, to implement natural language searching capability would be quite a feat. Especially in the age of, "ROFLMAO wtf iz 4 computa?!!1"

    --
    "Know but never fear the consequences of your actions."
  25. A hint of direction and technology by ngreenfeld · · Score: 1

    For those who are speculating about where they are going, a possibility is in a recent (within 5 years) article by William A. Woods, one of the top natural language researchers. His work at Sun was about using noun phrases (turned into concepts) as search guides. No idea if this is relevant to Google, but the work seems very promising.

    And sorry, I don't have the reference handy.

    1. Re:A hint of direction and technology by ooutland · · Score: 1
      --
      I'm the queer the atheists sent here to take away your gun!
  26. Me Tarzan; You Jane by peter303 · · Score: 2, Insightful

    How much natural language do you really need for a search? Not much.

  27. Not worth processing sentences by 192939495969798999 · · Score: 2, Insightful

    All you have to do is look at Yahoo answers' average question clarity to get a sense of why whole-sentence AI may not be the best strategy for a search engine.

    --
    stuff |
  28. Question Answering research by msbmsb · · Score: 1

    For Natural Language Processing and Question Answering research activities, search for "AQUAINT (DTO OR ARDA OR IARPA)" and also the NIST TREC (Text Retrieval Conference) workshops and research competitions.

    There is a lot of interesting work out there and some answers as to why more precise information finding through natural language input is useful.

  29. What search will do to language by ooutland · · Score: 1

    As a commenter indicated, it's easier for us to adapt to computers than to adapt them to us. Long term question: as we adapt to our computers, using handfuls of keywords instead of sentences, how will it affect the language itself? Change in language comes from technology now, c.f. "w00t" as word of the year or the most popular txtmsg acronyms.

    Will we be reduced to the news people in that beer commercial who sum it all up in 10 seconds so they can go drink? It could have a positive effect in stripping language of fuzziness; if you were to Google 'initiating mobilizing synergistic dynamics to maximize total quality excellence,' you wouldn't get much, because it's b.s., whereas 'build better mousetrap' would give you hard data. Meetings would certainly get shorter if we were forced to communicate in searchable terms.

    On the other hand, storytelling would suffer. "Boy girl meets gets loses" is ideal search terminology, but doesn't exactly pull the heartstrings.

    --
    I'm the queer the atheists sent here to take away your gun!
  30. Language vs. keywords by Punk+CPA · · Score: 1

    I agree with the other comments that it is much easier to get the user up to speed than to make search criteria easy for naive users. Remember Ask Jeeves? That implementation of natural language queries gave results that were not much better than random. Serious users quickly catch on to the tricks of word order, quotes, +/-, etc. Really, it's not much harder than typing a sentence and gives more predictable results.

  31. star trek as a guide? by circletimessquare · · Score: 1

    then i won't be impressed until i can type "earl grey, hot" into google and find a nice cup of tea on my cd tray

    --
    intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
    1. Re:star trek as a guide? by amh131 · · Score: 1

      It seems rather more likely that you would get something that is almost, but not quite, entirely unlike tea.

  32. Dude, you are so ignorant by Anonymous Coward · · Score: 0

    What you are saying has nothing to do with the parents point. Do you even know the first thing about natural language research and/or AI research ? He's saying IT HASNT DONE SHIT since eliza was written. All of old fogey AI academics worshipped at the altar of Godel-Escher-Bach, only to find out that we were scratching granite wth our fingernails.

    1. Re:Dude, you are so ignorant by Intron · · Score: 1

      Actually, SHRDLU in 1970 was the peak. Eliza had no internal world model. SHRDLU could answer questions that started "Why did you..". It's pretty much gone downhill since then.

      --
      Intron: the portion of DNA which expresses nothing useful.
    2. Re:Dude, you are so ignorant by Anonymous Coward · · Score: 0

      Oh yes so true. I just typed eliza because it was the first thing to come to mind. SHRDLU is described in Godel Escher Bach and then it looked like it was only a matter of more time and work before the Turing test got passed. Ah, we were so naive.

  33. NLP is very useful by Sciros · · Score: 2, Informative

    Natural language processing is useful when it is well-done. Getting it well-done is the tough part. Don't let Google reps trick you into thinking otherwise just because their R&D in the field isn't where they'd probably like it to be.

    Here are some situations where it's useful:
    1) interpreting a question rather than just treating it as a "bag of words." For instance, one can type "how tall is Mt. Everest" in the search bar and Google, rather than searching for documents that contain those 5 (or so) tokens will interpret that as a query asking for height and also search for documents that contain "Mt.", "Everest", and "height". Take that a step further and it might look for strings that represent height such as a number followed by "ft" or "meters" or "m".

    2) Condensing query chains. Suppose you want to know what sport our 4th president enjoyed playing most. You can ask "what sport did the fourth president of the US like playing?" and the system will give you an answer by first interpreting "fourth president of the US" as Madison, and then searching for what sports Madison enjoyed playing. If not for such interpretation you would either have to run 2 queries (first to find out who the 4th president was, then what sports he liked), or hope that there is a document out there that Google's indexed that contains the words in that initial query.

    3) Speech recognition! If you want to run a Q/A session with a computer system that has a speech recognition front end, it is more natural (easier and faster) to ask it "how tall is mt. everest?" than to say "mount everest height" or whatever you would end up typing into Google today. People like to speak using *natural language,* after all. They would gladly do it with computers if the SR systems in them were good enough (some are).

    4) More precise query results. What's better, getting back a document that is likely to contain the answer to your query, or getting back the sentence that contains it? Or better yet, getting back the answer and nothing else? The more robust an NLP system the more complicated queries it can interpret and the more elegant its result can be.

    On that note, Google actually *does perform* NLP on queries despite what from the summary (I didn't RTFA) looks like claims to the contrary. If you ask Google "how tall is Mt. Everest?" it actually DOES interpret that particular sentence and gives you the answer -- 29000ft or thereabouts. And you only get such an elegant result if you type "how tall is Mt. Everest" (without quotes) or "Mt. Everest how tall". Other queries of this nature will not give you quite as precise a response.

    --
    I like basketball!!1!
  34. I already do this anyway. by singingjim1 · · Score: 0

    I phrase a majority of my searches as questions already and get back reasonable results. Like Norvig said, it's about the words in general and their meaning together in a phrase. In my experience I ask and I receive. What's the problem?

  35. Classic Debate by LarryIsMe · · Score: 1

    From my view, this is the classic debate in technology: emulating nature vs. reinventing nature.

    When people first tried to fly, they copied birds but the better solution was to understand the principles of aerodynamics and
    leverage the technology available.

    The wheel was a better idea than trying to recreate feet.

    In the key words vs natural language debate, Google has shown that key words is the better solution for now.

    The real question is: how do you make searches more intuitive to the person making the search?

    After all, usability is the only criteria that matters.

    PowerSet.com claims to have a natural language search that's superior to the keywords searches. Let's see if PowerSet has the service to back up its boasts. PowerSet.com currently hides its service -- which is not a good sign.

  36. What could possibly be wrong with that? by Dan+East · · Score: 3, Insightful

    > wii
    Your query does not include a verb.

    > find wii
    Whose "wii" do you want me to find?

    > find wii review
    Unable to find any reviews authored by "wii".

    > find review about wii
    No reviews found concerning the common noun "wii".

    > find review about Wii
    Here is the most recent review about the proper noun "Wii": [url to a page full of keywords related to Wii]

    > find review about Wii order by relevence
    "relevence" is not an English word. Did you mean "relevance"?

    > find review about Wii order by relevance
    Here is the most relevant review about Wii: [url to a 2 year old pre-review of the Wii before it was launched]

    > find review about Wii order by relevance then date
    Here is the most recent and most relevant review about Wii: [url to a fanboy site]

    > find all reviews about Wii order by relevance then date
    Working...

    > abort
    Abort what?

    > abort search
    I am currently performing 1,231,415 searches. Which search do you want me to abort?

    > abort last search
    You do not have permission to abort others' searches.

    > abort my last search
    Last search aborted.

    > find several reviews about Wii order by relevance then date
    "Several" is not a quantifiable adjective. Do you mean "seven"?

    > find seven reviews about Wii order by relevance then date
    Here are your results. For better search results please capitalize the first word of sentences, and end sentences with proper punctuation.

    Dan East

    --
    Better known as 318230.
    1. Re:What could possibly be wrong with that? by Sciros · · Score: 1

      Wow Dan East your NLP system is a real piece of trash. You should look at how most systems of this sort are actually put together before making a pointless straw man. :-P

      --
      I like basketball!!1!
    2. Re:What could possibly be wrong with that? by Dan+East · · Score: 1

      Not that anyone cares one way or another, but my post was meant to be a joke.

      Dan East

      --
      Better known as 318230.
    3. Re:What could possibly be wrong with that? by Sciros · · Score: 1

      Oh, well then nvm ^_^

      I used to have a friend that said things along the lines of what you said, but in seriousness to try and argue against something. I approached your post with the wrong mindset and I'm glad I was mistaken.

      --
      I like basketball!!1!
  37. What about Karen? by timtimtim2000 · · Score: 0, Offtopic

    Google should look to Karen, the computer wife of Plankton on SpongeBob SquarePants. Karen is so advanced her natural language responses even include sarcasm.

  38. Powerset by Anonymous Coward · · Score: 0

    At least one startup is betting that natural language search will be the way to go. A number of ex-yahoo people there.

  39. Natural Language is Stupid and Limiting by eno2001 · · Score: 1

    While natural language might seem like a good idea to people who are less technical, it's actually a really bad idea. It would slow a lot of things down in terms of search and would bring with it deep inefficiencies. Frankly, I think search engines would be improved if they offered advanced features with brief commands (kind of like how Unix abbreviates 'copy' as 'cp' or 'move' as 'mv'). For example, which do you think is better when you want to move quickly, a vehicle with wheels, or a bipedal vehicle with legs? The answer is obvious, wheels trump legs for speed. The same with language interfaces to computers. A middle language between machine and human language is the best approach. With a focus on efficiency and no ambiguity whatsoever. Loglan. There you go. move along...

    --
    -"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o
  40. wordnet, subjects, and more by Aradorn · · Score: 1

    first step to building a NLP like search engine would be to map words to their respective subjects (or classification) - this has already been done with wordnet. then as you crawl the net you map the words found to your heirarchy, and you keep a running total of frequency of words on the document as well as the frequency on the net. Eventually, you can sift out the words that have little to no meaning (words that appear frequently typically have no meaning - the, a, and, but, etc...).

    Now combine this with pagerank and social ranking and you can refine search results down pretty quickly. During my undergrad I was able to get really good results with this method but I needed more sites in my index to really see if it would work.

    Essentially what happens is your queries start off broad and you refine the results down by providing more terms to search by that are associated with the line of queries. (This is how search engines like ask.com (teoma.com was the company that focused on this) work).

    1. Re:wordnet, subjects, and more by Helios1182 · · Score: 1

      I can assure you that WordNet has been used in many more advanced ways than that, but it still generally doesn't outperform less language based algorithms. WordNet doesn't really provide that much semantic information. There are other resources like VerbNet, FrameNet, Corelex, and PropBank that work on capturing the semantics. If you are using WordNet and only keywords you run into big problems with ambiguity, and there isn't enough information in 3 keywords to allow regular word sense disambiguation methods to work. Using full natural language sentences would help a bit, but it is still limited. You really need to identify the correct semantics for the documents in order to make them easier to search.

  41. What did he say? by Anonymous Coward · · Score: 0

    There are several problems with using natural language as a query language. For example, my northern neighbor, from Jamaica, is understandable and my Sothern neighbor, from Columbia, is intelligible but I tend to have to translate idioms between them. Illustrating that there are only about 6 billion natural languages to deal with. And you must use lots of short sentences with children, but longer more complex phraseology with adults.

    The other problem is that because most words get repurposed over time and fields of study, a lot of natural language is used to set the context. The word "affluent" means quite different things when talking about watersheds and neighborhoods. And the rules of grammar pretty much guarantee that the words "watershed" and "affluent" would be in separate phrases with all the intervening words the phrases need. Hence the natural language query would be much more voluminous and need much more processing.

    Still once computers can "read" a paper and "understand" what it says; a natural language query might be more efficient for constraining the search. (What rivers are affluent to the Blue Nile south of the 34th parallel?) But while the search engine scanners scan a document and create a key word distance measure on improbable words the improbable key word set will still be the most efficient query language. (Returning the containing documents instead of the answer.)

  42. Understatement of the year by Chapter80 · · Score: 1

    Go back to Excite and the search engines before: you have a box, and you get a list of 10 results, with a little bit of information accompanying each result. We've just stuck with that.
    TR: What has changed?
    PN: The scale. There's probably a thousand times more information.
    1000x? That's got to be the understatement of the year! If not the understatement of the second.
  43. I guess they'll just let Powerset become by melted · · Score: 1

    I guess they'll just let Powerset become the next Google. Face it, "keywordese" language is often not adequate. Questions constitute a significant fraction of search engine traffic, and all search engines fail miserably on anything but "how to" queries. Just yesterday I was looking for a comparison between two products on the web. I've found it, eventually, but there's no real reason why it shouldn't be the first hit after I enter "comparison between X and Y". It's not a question in itself, yet it's a distinctly natural phrase. I bet people would use things like this quite a bit if they actually worked well. Looking further into the future, quite often I'm looking for an answer, not for a set of hits I have to read and summarize myself. In 20-30 years from now I won't have to waste my time. I want the computer to become my "secretary". I give it a task (find relevant information about topics X and Y, summarize, present) and off it goes. In a minute or so I have a page of concentrated information to digest.

    The reason why Google won't focus on NL queries is because there are a lot of unsolved problems and those may take decades to solve. Disambiguation/polysemy, summarization, knowledge representation, reasoning - you need all of them be anywhere close to a human in language understanding, and none of this is really "solved" yet. This even ignores purely technical issues (i.e. issues that can be solved today with a bit of elbow grease) such as extracting salient bits from the pages, storing linguistic data in index efficiently and retrieving it from there in a meaningful way in real time.

    Is it hopeless, then? I don't think so, for two reasons. Reason one, it won't get done unless someone does it. Reason two, there are working implementations of language-aware search that for certain types of queries yield substantially better results. If Google doesn't do it, someone else will. And you can bet a billion bucks they'll patent the heck out of it.

    That said, I don't see keyword search going away anytime soon. It works well for a lot of things and it'll live side by side with NL queries. But next time you click a link after link after link in Google's results page, think whether it'd be easier to just type a natural language phrase and have Google "understand what you mean".

  44. Natural language narrowing down of the search term by Anonymous Coward · · Score: 0

    Let's say I do a search for "java" and I get 501,000,000 hits I would like to narrow this down.

    I'd like the search engine to give me a list of topics to refine my search.

    Programming language
    Coffee
    Island
    Companies
    Other

    And lets get rid of any links that are just lists of words. Read the web pages using natural language processing so that the computers understand what the page is about. Lists of words and random sentences should fail this natural language processing, and so not be in the context of anything.

  45. the natural order of natural language by epine · · Score: 1

    Actually, one of the main challenges with natural language is that we humans perform so badly to begin with. Half the time we neither say what we mean, nor mean what we say. But it hardly matters: far more than half the time, the person (or people) listening hear either what they expected to hear, or what they wanted to hear, or they already knew they would disagree with whatever you were about to say before you even opened your mouth.

    Sometimes it does matter. However, by the time you design a linguistic study to isolate the human gift for parsing grammar, the experimental task is about as "natural" as writing a law exam.

    I think the contribution of grammar to early human language is way overstated. You don't need much grammar to handle everyday events, such as determining how to dress for dinner when the report from the field comes back "mammoth tusk hunter" or "hunter spear mammoth": in the former case (x3!) you'll be polishing your nose bone.

    Where word order begins to matter is parsing the daily scuttlebutt. Did Adam tell Carol about Bob and Eve, or was it Eve telling Adam what she overheard between Bob and Carol? It's not easy keeping the cheaters distinct from the cheated upon. Plus Adam has to remember when to look surprised when Eve tells him something he learned from Carol just the other day. Not keeping your past/present/future and your cheatee/cheaters straight was a certain recipe for not sleeping on the warm side of the fire pit.

    Later on, the grammar we acquired to parse who's zooming who became useful for digesting the BBQ assembly manual, but of course, that remains an evolutionary work in progress.

    Maybe when children of the current MySpace generation reach the age to pop the big question ("What's an iPod?") and we've given up the fight to prevent our every indiscretion and peccadillo from being publicly archived for all posterity, we'll actually need a natural language interface to really drill down into the zettaflood of who said what to whom and who first posted it online and whether revenge was sweet.

  46. Natural language by Punk+CPA · · Score: 1

    Instead of trying to re-create or interpret the conventions of human speech, how about just a better way of representing the search results? I would like to see a visual representation of the search results so that I could spot the most promising semantic branches. There must be a way of grouping results that are closest in meaning, or refer to similar sources, or fall into broad categories of knowledge. Right now, Google just ranks them all in what it believes to be the order of significance, which is no help if the search results have gone in a direction not intended. Maybe the program should let humans resolve the ambiguity as much as possible; we're actually quite good at it. That's what makes Turing tests work.

  47. Google, please give us regular expression searches by knorthern+knight · · Score: 1

    or at least the option. That includes escaping "+" and "-". That would do sooooo much to improve searches.

    --

    I'm not repeating myself
    I'm an X window user; I'm an ex-Windows user
  48. "insightful"? by Anonymous Coward · · Score: 0

    How is the parent post insightful in any way? It's a fabricated example from nothing, a strawman post.

  49. He's not lying; you're not reading carefully. by jrtom · · Score: 1
    Quoting from the summary:

    'We think what's important about natural language is the mapping of words onto the concepts that users are looking for. But we don't think it's a big advance to be able to type something as a question as opposed to keywords ... understanding how words go together is important ... That's a natural-language aspect that we're focusing on. Most of what we do is at the word and phrase level; we're not concentrating on the sentence.'"
    That is, he explicitly says that _most_ (that is, not all) of their work is at the word/phrase level. This implies that some is at levels of abstraction above that. They may not be "concentrating on the sentence" but that doesn't mean that they're ignoring it entirely. Furthermore, there are well-known ways of creating good approximations of the meaning of a document that don't consider word order at all. The classic is the TF-IDF model, but there are others (Latent Semantic Analysis, other types of topic models) that are richer and more descriptive. No, they don't capture everything about the semantics or pragmatics of a document, but they do well enough to (for instance) provide good predictors of the grade of an essay as assigned by a panel of human graders.
  50. AI is more alive now than ever by Anonymous Coward · · Score: 0

    And even more so, AI, which was so promising to so many of us in the 80s turned out to be so hard that it is basically impossible.

    You must have been asleep for the past 2 decades. AI is, to every generation, the stuff that we don't know how to do yet. In 1985, a chess computer being world champion seemed like AI. In 1995, a computer answering the telephone when you ask if your flight is on time seemed like AI. There are still things that seem like AI, but I doubt my children will believe it.

    You might be thinking of Strong AI, but even that isn't completely lost yet.

    basically AI never got going,

    No, we just don't tend to call it "AI" much any more because it was hard to get funding for things labeled "AI" after the AI Winter. It's no coincidence that the guy who wrote the book on AI is now Director of Research at one of the top software companies.

    1. Re:AI is more alive now than ever by Anonymous Coward · · Score: 0

      AI research blows me.

  51. Fun with Google Suggest by MillionthMonkey · · Score: 1

    Sometimes I like to idly type things into Google Suggest and see what comes up:

    why is everything
    can you eat
    can you die from
    where can I go to get
    is it possible to
    how would you

    From playing with it for a few minutes, it seems that Google is mostly used by women in various stages of pregnancy, people worried that they might be arrested for using Limewire, and people looking for Wiis.

  52. AI is about context by master_p · · Score: 1

    Unless a computer knows things the way a human does, it's not possible for natural language queries to ever work.

  53. Is it too late for an open source search engine? by islisis · · Score: 1

    I think it is pretty clear that the manifestation google's ideology has secluded many who might have backed its progress originally. Is it too late for an open source, peer managed search network to form? Namely, in place of advertisements funding the service, how feasible would it be for the future's mainstream search to be managed by an academic network of global universities, catering to traffic via proximity, bolstering search features through open peer review and funded by mutually beneficial public sourcing?

    Many services have proved they can be managed without a nanny looking after everything, is search the same?

  54. Ask Jeeves was a flop because it started returning stupid results like, "Would you like to buy a Subatomic Physics?".

    --
    A house divided against itself cannot stand.