Slashdot Mirror


Google Raises Word Limit

Philipp Lenssen writes "Google quietly raised their web search limit to 32 words. Previously, only up to 10 words were allowed per query, with succeeding words being ignored. This is not only important to specific approaches of advanced searching (for example, when you need to exclude many different keywords using the minus operator), but it's also of great help to certain tools using the Google API. While there doesn't seem to be any official statement from Google yet, some more details can be found at my Google blog."

71 comments

  1. Finnally. by Phantombantam · · Score: 3, Insightful

    About time. I always thought of the 10 word limit as gogle's biggest setback.

    --
    42
    1. Re:Finnally. by smittyoneeach · · Score: 1

      This just makes sure that Redmondware stays comfortably buried in the dust.

      --
      Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
    2. Re:Finnally. by Anonymous Coward · · Score: 0

      HERE HERE!

    3. Re:Finnally. by Anonymous Coward · · Score: 0

      I always thought of spelling as your biggest set back

  2. Great by lastninja · · Score: 4, Interesting

    Now you can search for quotes, without having to strip half of the words away. Just cut and paste it in to the browser. I guess this will also make it easier to search for source-code, as it is now you will likely end up at a documentation - site. When you want is some sourcefile from some Sourceforge project.

    --
    John Carmack fan, browsing at +5 since 1999.
    1. Re:Great by Fryboy · · Score: 4, Informative

      The "*" character, used as a wildcard in Google, doesn't count as a word. So even previously, you could search for quotes and replace certain words with * to fit the entire thing.

    2. Re:Great by lastninja · · Score: 1

      I did not know that. Thanks will try it out.

      --
      John Carmack fan, browsing at +5 since 1999.
  3. very complex by St.+Arbirix · · Score: 2, Interesting

    32 word searching increases the complexity of the search many times over. For a ten word search you're usually talking about finding all documents with all ten words, ordering them by how many of the searched terms were found, and then by their linked-to values. With 32 you're finding ~3.2x as many documents, comparing for 3.2x as many words in each documents, and then finding how popular they were.

    So, um, wow.

    --
    Direct away from face when opening.
    1. Re:very complex by damiangerous · · Score: 4, Insightful

      How are you finding 3.2x as many documents? You should be finding fewer documents, not more.

    2. Re:very complex by Anonymous Coward · · Score: 2, Insightful

      False.

      With 32 words you will be able to find theoreticaly almost any page. The difference is much more that 3.2x

      With 10 words - you can search for about <NumberOfWords> ^ 10 ( number of words in power 10 ), but with 32 words - this will be <NumberOfWords> ^ 32.

      Now think about number of words in all languages Google can support.
      There are fewer than a thousand of the world's 6800 languages have writing systems ( http://www.ethnologue.com/language_index.asp )
      Let's assume that all languages has the same number of words as Enlgish one.
      There are less then 1000000 words in English. ( http://hypertextbook.com/facts/2001/JohnnyLing.sht ml ).
      So - there are assumed less then 1000000*1000 (= 10^9) words in all languages.

      As result - for NumberOfWords ^ 10 there will be about 10^90 possible simple searches (without using + - and/or logic).
      Taking in account our assumption - this is upper boundary for number of possible searches. As well - not all of 1000 writing systems supported by Google.
      This is very close to number 10^100 (googol number - http://en.wikipedia.org/wiki/Googol ).

      But with 32 words - upper bound for number of simple searches can explode up to fantastic 10^288.
      This is clearly more then googol number can handle ;-)

      P.S> This math does not pretend to be scientific and correct. Feel free to make research on this subject on your own.

    3. Re:very complex by St.+Arbirix · · Score: 1

      Does google limit search results to documents that contain every single word you've queried for? ...

      Oh. I see. In that case the complexity only increases with the number of times the document is passed over for each word, or 3.2x which is probably over twice as high as the average number of times a document needs to be scanned for a word before finding that it doesn't contain a word...

      That's not too hard. Why is there a limit?

      --
      Direct away from face when opening.
    4. Re:very complex by Anonymous Coward · · Score: 1, Insightful

      The above post is utter nonsense. (insightful?)

      First of all, all Google search words are required to be present on a webpage, so adding more words lowers the number of hits.

      Besides that, the reasoning above is absurd. Why should the number of possible searches correspond to the number of hits?

      And the number of languages in the world is appearing in the equations above? Even when probably 90% of all webpages are written in english?

      Theres nothing interesting about the sheer number of possible searches. After all Goggle only has indexed 8*10^9 webpages.

      PS: By the way - in order to correctly count the possible searches, you should take into account that the order doesnt matter, and divide by the appropriate factor!

    5. Re:very complex by Anonymous Coward · · Score: 0
      What?

      Google doesn't scan documents for each word in your query. Do you know anything about indexing?

    6. Re:very complex by Anonymous Coward · · Score: 0

      but if you know anything about google's architecture, word order DOES matter.

    7. Re:very complex by HugeFatty · · Score: 1
      Yes, you do find fewer documents, but you have many more potential documents. That is, when a search is run, you must find the documents that contain all of the keywords (in most search engines, anyway).

      So say you have search terms a, b, and c. The documents that contain these are found using the index that they construct, and the documents that contain a are set A, the documents that contain b are set B, and the documents that contain c are set C. You must then find the intersection of A, B, and C. Although this is an easy concept, it is not as easy to do quickly, as it requires that you iterate through each set to find the documents that have all of the terms in common.

      So if a is contained in 500,000 documents, b is contained in 100,000 documents, and c is in 50,000 documents, you have to iterate through 650,000 items to find the intersection, even though that may only be 100 documents.

      And no, it's not feasible to have the index take word pairs or triplets (or more), either, due to space limitations. That would potentially square or cube (or more...it depends on what n is...) the amount of space needed to store the index.

      I hope I am explaining this well. If not, then just try to think about how you would find the intersection of n lists on paper or programmatically, and you will realize how slow that is. I'm guessing that Google can do this because they have the money to through more computing power at it.

      And yes, I am a search engine developer (though not for Google...)

      --


      I am clearly fatter than you.
    8. Re:very complex by digitalpeer · · Score: 1

      32 word searching increases the complexity of the search many times over.

      Are you sure about that?

      house - 294,000,000
      house car - 24,700,000
      house car boat - 6,250,000
      house car boat dog - 1,570,000
      house car boat dog smoke - 412,000
      house car boat dog smoke funny - 163,000
      house car boat dog smoke funny slashdot - 2,200

  4. Nice link by Anonymous Coward · · Score: 0

    I'm glad the first word in your post is a link to google otherwise I would never have known where to find it. Of course, without the link to google and the Google API page your post would have looked like a blatant attempt to drive traffic to your blog.

  5. searching for non a-z characters by fluor2 · · Score: 4, Insightful

    characters like !,.'$ is pretty much not supported by google. i would like those to be included in the future.

    1. Re:searching for non a-z characters by Lally+Singh · · Score: 1

      "Lally's Wang" is pretty much not supported by google. i would like it to be included in the future.

      --
      Care about electronic freedom? Consider donating to the EFF!
    2. Re:searching for non a-z characters by Anonymous Coward · · Score: 0

      Lally's Wang is supported. Just take a picture, name it lally's_wang.jpg or somesuch, and upload it to enough servers..........

  6. Matching MSN Search? by Utopia · · Score: 4, Interesting

    Looks like the limit was raised to match
    MSN's new search whih has has sported a bigger word limit for quite some time.

  7. Great! by Anonymous Coward · · Score: 1, Funny

    Now when I do really specific searchs I can get truly relevant google ads!

  8. Good for searching multiple sites by prostoalex · · Score: 2, Interesting

    I discovered how to make a Firefox plugin for limiting Google searches to select few sites, but the problem before was that each site:domainname.com directive was treated as a term. So if you wanted to search 7 sites at once, then google would let you enter maximum of 3 keywords to span that search across multiple sites. So this keywords increase, you can do stuff like 5-word searches across 10 domain names, for example.

    1. Re:Good for searching multiple sites by gl4ss · · Score: 4, Insightful

      though.. it's still not good enough.

      what I would hope for them to introduce would be a word blacklist that would be personal, and that you could include at least a thousand terms in it.

      why? TO AVOID THOSE FUCKING LINKFARMS, they usually have the same advert links in them so just adding the referral id of the owner of a certain farm will get a lot of meaningless sites out of the search. it's doable now if you make your own program that does the filtering(using googleapi. there's two ways, either go to the sites yourself or request the cache from google.. massive traffic in any case for you and the search will take ages to complete).

      --
      world was created 5 seconds before this post as it is.
    2. Re:Good for searching multiple sites by enosys · · Score: 1
      A personal blacklist is a pretty good idea. Google is already working on personalized search based on a profile which contains a list of interests. They should try out more personalization like that.

      However, I don't think that's a good solution for getting rid of link farms. Google should deal with those itself because they mess things up for everybody. They should keep tweaking their alogirthms to detect link farms better and encourage people to report them.

  9. Mod Parent Up by zarthrag · · Score: 1

    It's true, I haven't been able to check it ALL DAY- - That's almost as bad as slashdot being down. Some of my other friends have access still, but my account is teh suxxor at the moment.

    --
    Why can't all fpga/microcontroller manufacturers just release free optimizing compilers???
  10. How To Use 32 Words To Improve Your Searches... by smug_lisp_weenie · · Score: 4, Informative

    The problem with getting good search results are synonyms (different words that mean the same thing) and homonyms (the same word that means different things). With the 32 word limit, you can avoid both of these problems by following a few simple steps- Let's say, for instance, that you live in new york city and are looking for a moving company that specializes in fragile antiques... typically, the vagueness of such a query makes it hard to find good results, but not if you follow these steps:

    1. Break your search into 2-4 principal, independent concepts- In my example, the concepts are NYC (the location) moving company (the company type) and antiques (the specialty)

    2. For each concept, come up with as many terms as you can that are descriptions or examples of the concept that are very specific and won't trigger homonyms- For instance, you wouldn't want to use the word "New York" because it is too vague and could refer to the state (a company in Albany, NY won't help you). However, "NYC" "Long Island" "Brooklyn" "Queens" "New York City" are great, even if they seem overly specific- You just need one of them to cause a hit on a relevant page.

    3. Put parenthesis around the terms for each concept (be sure to put quotes around each compound term) and OR together the items inside parentheses.

    This is what the entire search might look like:

    ("NYC" OR "Long Island" OR "Brooklyn" OR "Queens" OR "Manhattan" OR "Bronx" OR "New York City" OR "Big Apple") ("moving company" OR "moving companies" OR "specialy movers" OR "professional movers" OR "u-haul" OR "apartment movers") ("fragile" OR "antiques" OR "china" OR "difficult to move")

    It takes a bit of time to put together (and google will run slooooow because this kind of logic is very difficult for the search engine), but a search like this will give you the best possible results on hard queries.

    1. Re:How To Use 32 Words To Improve Your Searches... by Anonymous Coward · · Score: 0
      Not slow at all:
      Web Results 1 - 10 of about 11,400 for (your really long query). (0.20 seconds)
    2. Re:How To Use 32 Words To Improve Your Searches... by IO+ERROR · · Score: 1
      You're right, it DID take a while to perform that search. 0.20 seconds.

      Unfortunately, what this didn't tell me is which moving company isn't going to rip you off.

      --
      How am I supposed to fit a pithy, relevant quote into 120 characters?
    3. Re:How To Use 32 Words To Improve Your Searches... by KivlE · · Score: 1

      The result has probably been cached after dozens of others did the search before you.

    4. Re:How To Use 32 Words To Improve Your Searches... by stienman · · Score: 1

      Did you mean: ("NYC" OR "Long Island" OR "Brooklyn" OR "Queens" OR "Manhattan" OR "Bronx" OR "New York City" OR "Big Apple") ("moving company" OR "moving companies" OR " specialty movers" OR "professional movers" OR " uhaul " OR "apartment movers") ("fragile" OR "antiques" OR "china" OR "difficult to move")

      And just as pedantic as ever...

      -Adam

    5. Re:How To Use 32 Words To Improve Your Searches... by bedessen · · Score: 1

      Before you advise people on how to use google you might want to learn how it works. Google has featured stemming for quite some time, and so you don't need to waste time with this '"moving company" OR "moving companies"' stuff. It's even mentioned in Google's basic help page.

    6. Re:How To Use 32 Words To Improve Your Searches... by IronicCheese · · Score: 1

      "I do not think that word means what you think it means..."

      A Homonym is a word that *sounds* like another word, but has a different meaning. Bear and bare are homonyms. Their and They're. Too, to and two. You get the idea.

      "the same word that means different things" is called a homograph. Commonly we say that word as several meanings or senses.

    7. Re:How To Use 32 Words To Improve Your Searches... by smug_lisp_weenie · · Score: 1

      Last time I read their help pages they hadn't had this feature yet. It's still mostly limited to handling plurals and a few other syntactic variants, however.

  11. I thought so... by Anti_Climax · · Score: 1

    I was searching last night for Warez^H^H^H^H^HOpen Source Software downloads and it wasn't giving me any greif about what seemed to ba a fairly long search string.

    [/curiousity]

    --
    Even people that believe in pre-destiny look both ways before crossing the street.
  12. Regexp by John+Hasler · · Score: 3, Insightful

    Now, if they will just accept regular expressions.

    --
    Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    1. Re:Regexp by Scaba · · Score: 1

      They probably heard this statement.

    2. Re:Regexp by Alomex · · Score: 1

      Now, if they will just accept regular expressions.

      Classic newbie mistake. The biggest problem with search engines is that they return too many answers not too few. Adding regular expressions or stemming makes your answer set even bigger.

      What we need is ways to make the answer set smaller, not larger. Hence the benefit of clustering, for example (see for example http://vivisimo.com/search?query=search+trees&v%3A sources=Web).

    3. Re:Regexp by vladd_rom · · Score: 2, Insightful

      >> The biggest problem with search engines is that they return too many answers not too few. [...] What we need is ways to make the answer set smaller, not larger.

      The problem that annoys you is not the size of the answer set, but the lack of a proper sorting function (by relevance) to satisfy you. The fact that you find your desired answer at the 10th or the 30th position is a sign that sorting doesn't work like you'd expect it to. It has nothing to do with the size of the answer set.

      I don't want a smaller answer set, I want a bigger one. As long as the sorting function works like expected, I always want to see the results sorted by relevance, and I want to have a bigger pool of those so that the first one is truly the most relevant.

    4. Re:Regexp by Alomex · · Score: 1

      I don't want a smaller answer set, I want a bigger one.

      Maybe you do, but most users don't. Less than 30% click next page.

      Reading on, I think what you mean to say is that you would like the answer to be selected from a larger set expanded perhaps to include stemming. In principle that sounds fine, in practice a decent answer is almost always contained in the 31 million+ pages that google returned.

      The problem was that google didn't understand that in the search for "tree" the user meant binary search tree, and hence the first ranked answer set was about a phylogenic tree project.

      Remember the smaller the answer set we get, the more time we have to rank each page.

    5. Re:Regexp by Anonymous Coward · · Score: 0

      You mean that Google's developers are afraid of what they don't understand, just like most people? That is tantamount to saying that Google is NOT God, after all. Blasphemy!

    6. Re:Regexp by Minna+Kirai · · Score: 1

      Maybe you do, but most users don't. Less than 30% click next page.

      And what percentage of users can write regular expressions? Probably less than 0.3%, so what's the problem?

      Anyway, your thesis that regexps will lead to longer result lists is incorrect. If I really want to search for "Windows (95|98)", today my only recourse is to enter "Microsoft Windows" and then manually skip the (majority) of irrelevant hits, or to search for both "Windows 95" and "Windows 98", then manually unify the two returned lists.

      Regexps in this case would give me search results that are both shorter and more pertinent.

    7. Re:Regexp by Alomex · · Score: 1

      I actually have first hand data on this, this isn't just speculation. They are rarely used, they generally increase the size of the result set, and they increase the workload substantially.

    8. Re:Regexp by Minna+Kirai · · Score: 1

      They are rarely used, they generally increase the size of the result set, and they increase the workload substantially.

      Wow, self-contradiction within the scope of a single sentence. If they're "rarely used", then they can't possibly increase workload very much,

    9. Re:Regexp by Alomex · · Score: 1

      If they're "rarely used", then they can't possibly increase workload very much

      Easy: if an operation is sufficiently expensive, the actual cost is noticeable, even when rare. This is known as heavy tail meaning that "a relatively small number of very high cost events skews a mean calculation".

      No contradiction there.

    10. Re:Regexp by jasonwea · · Score: 1

      In this example I believe "Windows 95" OR "Windows 98" would do the trick.

      Of course regular expressions would be nice, but I just don't see them happening any time soon due to inherit resource requirements.

  13. Google API? Useless. by Guspaz · · Score: 2, Insightful

    "it's also of great help to certain tools using the Google API"

    Hardly. The Google API is limited to 1000 searches per day, making it useless for any sort of web application. About the only thing I can think of that it would be useful for is a desktop program in which the user would only perform a limited number of searches.

    1. Re:Google API? Useless. by Anonymous Coward · · Score: 0

      About the only thing I can think of that it would be useful for is a desktop program in which the user would only perform a limited number of searches.

      So you'd say that it's of great help to certain tools using the Google API?

    2. Re:Google API? Useless. by Drantin · · Score: 1

      A java/flash, etc. applet would work, wouldn't it? Or do they limit the daily use with some sort of developer account embedded in the code?

      --
      Actio personalis moritur cum persona. (Dead men don't sue)
    3. Re:Google API? Useless. by Guspaz · · Score: 1

      It's a developer account. So each search each user did would contribute towards your total.

    4. Re:Google API? Useless. by Guspaz · · Score: 1

      Upon further reflection, no. Even a desktop app would not work, because every copy of that app would use the same developer account at Google, so each user doing his searches on his desktop would count towards your 1000 per day limit.

    5. Re:Google API? Useless. by no+soup+for+you · · Score: 1
      Hardly. The Google API is limited to 1000 searches per day, making it useless for any sort of web application.

      Well, it appears to be useless for your web application. In my opinion, 1,000 queires a day seem a lot for a non-commercial product. Google may add a commercial program that allows more than 1000 queries per day: (google answer: http://www.google.com/apis/api_faq.html#gen15.

      Lastly, I always like to mention the API is a new, free, and beta service. My gut says that if you need more than 1,000 queries per day that its a commerical application who's primary feature is the google search engine, and you won't be able to utilize google's "IP" for such an app.

      --
      If you blog it...
    6. Re:Google API? Useless. by Guspaz · · Score: 1

      That doesn't make any sense. Are you saying that a non-profit web app couldn't attract more than 300 to 500 users per day? That's nothing.

      It's said that they may open a commercial program for it for years now, it's not going to change anytime soon.

    7. Re:Google API? Useless. by no+soup+for+you · · Score: 1
      Are you saying that a non-profit web app couldn't attract more than 300 to 500 users per day? That's nothing.

      Sure it could, and if all the web app did was search, the the 300 to 500 users (or more) would exceed the 1,000 queries per day. On the other hand, if all the web app did was search, why would Google want you to freely take people away from their search engine?

      IMO, this API could be put to a good, supplemental use in an application (one where searching could happen, but is not the primary focus).

      --
      If you blog it...
    8. Re:Google API? Useless. by Guspaz · · Score: 1

      Even with an application, you'd be artificially limiting the possible growth. It wouldn't take very many users before you'd have to remove any such feature from the application because you'd have hit your 1000 search cap.

      And who said anything about taking people away from google freely? The problem is they don't allow you to purchase more searches.

    9. Re:Google API? Useless. by bbtom · · Score: 1

      The solution is obvious. You set up a web app and get your visitors to sign up for a Google API account and copy the details in to a profile. They log in, do their searches, it deducts from their own totals and the web app just plods merrily onward.

      --
      catch (HumourFailureException e) { e.user.send("You, sir, are a humourless idiot."); }
  14. GoogleWank by Anonymous Coward · · Score: 0

    return -EIDONTCARE;

  15. Google Grid by digitalgimpus · · Score: 1

    Seems like another step in the evolution towards Google Grid / EPIC

  16. Whats next... Adding contacts lists to gmail? by SufficatedDeveloper · · Score: 1

    The 32 word thing is cool... But adding the ability to add distribution lists to my contacts in GMail would be WAY more useful

  17. Now I'll be able to search for... by binaryspiral · · Score: 1

    Now I'll be able to search for the exact error message my windows boxes toss at me. Woo hoo!

    If google had raised it's limits earlier, I could have skipped that school diploma and just went right into I.T. support.

  18. Has Anybody Called Google? by bill_mcgonigle · · Score: 1

    Hardly. The Google API is limited to 1000 searches per day, making it useless for any sort of web application.

    Perhaps for a pure non-profit web app, but if you're collecting advertising revenue you might be able to slide some of this Google's way for a higher limit.

    Has anybody actually talked to someone at Google about licensing? (i.e. not just what's on the FAQ)

    --
    My God, it's Full of Source!
    OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  19. What I'd like to see by alexo · · Score: 1


    In no particular order:

    * A better query language, with wildacrds ("Word*") or stemming, proximity operators, parentheses, complex boolean expressions (something like what Dejanews and the pre-Yahoo AltaVista used to offer).

    * Filtering out linkfarms and search-pages.

    1. Re:What I'd like to see by fbjon · · Score: 1
      "* ... stemming, parentheses"

      Already in there, it seems.

      "* Filtering out linkfarms and search-pages."

      They're working on that, help them out.

      --
      True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
    2. Re:What I'd like to see by alexo · · Score: 1


      >> "* ... stemming, parentheses"
      >Already in there, it seems.


      I seem to have missed it. Have a pointer?

    3. Re:What I'd like to see by fbjon · · Score: 1

      Well, Google help talks about stemming. Can't find parentheses, but this page and a previous post talks about it. I found that | doesn't seem to work for OR though.

      --
      True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
    4. Re:What I'd like to see by alexo · · Score: 1

      Right about stemming.
      Parentheses (as in "(A and B) or (C and D)") don't work

  20. Google doesn't have wildcards... by djlewis · · Score: 1

    * or anything. The *s are ignored altogether. Try it (both ways). This is ~another~ of Google's maddening limitations. A sales rep told me wild card's not there because they have found that it doesn't increase search effectiveness by double -- whatever that means -- and that's their standard! Sheesh! Tawkabout monopolies!