Google Suggest Dissected, Part II
Bert690 writes "To complement the recent dissection of Google
Suggest's innovative front end, I investigated [Coral Link & mirror]
the back end of the system in an effort to determine just how it generates suggestions. Along with some preliminary findings, you'll find a pointer to a program for enumerating all
possible suggestions from a given starting point. I found the number of possible suggestions to be surprisingly small considering the immense scope of the web."
It's not the amount of data that a program references to create a result, it's the precision of it's result that matters... if it can do it with relatively little data, then it was designed/implemented by someone who knows what they're doing...
---
Programming is like sex... Make one mistake and support it the rest of your life.
Why does that annoy you? For those of us who can't spell, it's quite helpful.
Basically this is a hacky method of accessing fields. The code to do it is burdensome to say the least.
Is there any work on a toolkit or API that allows relatively easy access to this technique?
Press "p" and the first thing "google suggests" is "Paris Hilton", hmm. Although on a cooler note when yopu press "f" the first suggestion is firefox!
If you're interested in Search Engine Optimization, the tool can be used like the Overture Keyword Selector Tool. Similar results are obtained with both, which is interesting all in itself. A guy built an interface similar to Overture to use with Google Suggest.
:)
Other than that I can't think of a real use... I usually know what I want to search for on Google. It could help optimize queries I guess (see the "number" of results before hitting submit, but not the quality...)
Happy Holidays to all Slashdotters, by the way
Eureka Science News - automatically updated
For those of us who can't spell, it's quite helpful.
/. is going to fade into obscurity within the next few years.
Get Firefox if you haven't done so, then download the spellbound extension. Then you can spell check to your hearts content.
Apparently, either the mods are on crack, or they believe that Troll=someone that goes against what they believe and to keep everything "Status Quo" by modding the ACs or the non "1337" geeks down constantly so they will be shut out and they continually get the +4 and +5 Insightful/Interesting/Informative Mods so they can really troll without really feeling the consequences. I think
Why does that annoy you? For those of us who can't spell, it's quite helpful.
You could have put some effort into that statement. How about
Wy doz that annoi yu? For thoze ov us hu kant spel, its kwite helpfle.
http://eric.blognews.com/blog/archives/2004/12/10/ 202467.html
Google needs to remember the last x queries that we submitted and the time we submitted them to better guess what we're looking for. If I hit 'p' I get Paris Hilton even though previous searches were for perl, parrot and pascal.
When will they work out that there are different classes of users out there that look for different things at different times?
As big as the web is, it's just the same boring drivel over and over... it shouldn't be too hard to make Google Suggest! :)
Berto
I like trying to use Google Suggest in unexpected ways: Try typing in 1ZE and see all the UPS tracking numbers that come up. Pick one and track it. Or try typing an area code with a large population (201, 212, 213, 818, etc) and maybe add a digit or two and see what telephone numbers people have been searching for lately.
Quite amusingly, a number of words seem to be censored... It you type, say, sex, then you have no more suggestions... Even, if you type it within a word...
After some period Google will not only suggest but will also take decisions for you!
wait....
Isn't "I'm Feeling Lucky" option takes a decision for you?
... it doesn't include dirty words. I know, I may be a little immature, but it's almost always the first thing I try on anything like this. There's not even a way of turning 'safe suggest' on or off or anything. Even such innocuous (and popular!) words like 'nude' aren't suggested. What if you're searching for nude models for your art class, or the great nudes? It's just interesting... Google is becoming very corporate in terms of filtering out content these days.
Random rants about technology: http://technorants.blogspot.com
a: amazon
b: best buy
c: cnn
WHO THE FUCK SEARCHES FOR THOSE THINGS?? It amazes me how stupid people are - rather than type in amazon.com, bestbuy.com, or cnn.com, they actually search for them on Google.
It will show "penthouse", but not "playboy".
(\_/)
(O.o) This is Bunny. Add Bunny to your signature
(> <) to help him achieve world domination.
Contrary to what the author suggests, I suspect that the suggested searches are derived from query logs, not from the documents themselves.
As others have noted, the top suggestion for p is paris hilton with 6.7M results, but the number of results for the next 5 suggestions contain far more results -- more than 20M, in fact.
I doubt there is much of an attempt at precision. For example, the first suggestion for "new york" is "new york times"; the second is "new york."
Going by that, entering 'B' would bring up Brittney Spears, while in reality, it brings up Best Buy...
Apple has never claimed not to be evil, they're just very stylish about it.
It's easy to find whatever you want with Suggest. Overly broad terms don't make it into the list. Why should they? Each term shows how many results would be retrieved. Searching for "sex" or "porn" will return more digits than can fit.
Laws are for people with no friends.
It sounds like an extended version of the "I'm Feeling Lucky" feature.
It would be sweet to have this as an extension to the search bar in firefox. Other than that, I don't think I'd ever use it - too likely to forget it exists in the future...
Is that the internet is no longer *just* the geeks/nerds/calculator-watch crowd. There are increasing numbers of grandmothers and soccer-moms gaining access everyday. What was once a haven for the slide-rule crowd will soon become just like everything else, an asylum commercialized for the lowest common denominator - the general public. Once that milestone is reached, sites like /. will become fewer and fewer as we see more recipedot and howtogetmudoutofchildrensclothesdot popping up. It's not longer a possibility, it's an ever approaching event.
In other news, Merry Christmas!
in the early days of the internet, people were posting all sorts of websites on all sorts of topics. as the web became more commercialized, most geeks were (rightfully) worried that major commercial hubs would be created that would attract the majority of attention and dilute the importance of the more peripheral areas of the web. this trend is already underway, and tools such as google suggest will hasten the decline. users will be directed to the areas that most people are already going, thereby increasing the traffic to portals and decreasing traffic to niche or enthusiast sites. in my opinion, google suggest is ANTI-internet.
"True for very popular searches, but it you're searching for something more obscure, size most certainly does matter."
Google: hyperinflation deviant sexual body perception
"Results 1 - 10 of about 19 for hyperinflation deviant sexual body perception. (0.48 seconds) "
To answer your questions about how the suggestions are generated - from looking through your enumeration lists, they are obviously compiled from words/phrases that people have actually searched for.
If I remember correctly, I remember reading in one of Jacob Nielson's usability books about how a surprisingly large majority of users thought (this was back in the day before Google) that the Yahoo search field "was the internet". They typed everything into it, and payed no attention to the adress bar.
http://www.google.com/webhp?complete=1&hl=e
that long address wont help anyone.
Even if it is in beta.
Why does yahoo do this
When will people understand the difference between it's and its?
The suggested words by themselves may not be all that useful,but when combined with the number of results shown for each keyword ,I think it can be useful.
Google suggest may not be immediately be of use to everyone like Google.com,but will rather be when a particular situation arises for the user.
I think its a specific need based solution.
Why does yahoo do this
That would actually be useful. The current implementation of Google Suggest is primitive and it's usefulness is questionable. I don't understand what all the hype is about. A first year CS major could write Google Suggest. This is primitive technology that has been around for a long time. Because Google paired it with a serach engine it suddenly becomes revolutionary? I don't think so.
First suggestion for each letter/number:
i ll ine dictionary
amazon
best buy
cnn
dictionary
ebay
firefox
games
hotma
ikea
jokes
kazaa
lyrics
mapquest
news
on
paris hilton
quotes
recipes
spybot
tara reid
ups
verizon
weather
xbox
yahoo
zip codes
1
2004 election
3m
411
50 cent
60 minutes
7th heaven
89.com
911
02
What I'd like to know is... how to these search engines perform searches against such massive databases, with all the extra bells and whistles such as "google suggest", with SO MANY people using the service concurrently, yet still remain SO FAST?
Have they improved "quick sort" or something? Or have a beowulf-cluster of gingerbread pcs?
How is this different from what Firefox does already with form completion? Even if it is different, why would you want this implemented by google rather than by your browser?
;-)
Fox newsv aSoft
;-)
Google can charge money for inclusion of keywords that will lead to your website as result of search.
Just like current "paris hilton" shown on 'P'
Here is short list of paid keywords who already buy thouse letters:
Amazon
Best Buy
Bbc
Bank of America
Cnn
Dell
Ebay
Expedia
Espn
Firefox
Google
Gmail
Hotmail
Halo 2
Halflife
Ikea
Ipod
Java
JetBlue
Kazaa
La
Mapquest
MSN
Microsoft
Nokia
New York Times
Orbitz
Paypal
Panasonic
Qwest
Do you feel money involved here ??
They will try to sell you something right after you typed single character on computer
here we go then: /ducks
in Soviet Russia, you suggest to Google what it should search!
The power of accurate observation is commonly called cynicism by those who have not got it. -- G.B. Shaw
About six months ago I wrote a little spellchecker plugin for an internal app and used a very similar principal. The completion lists worked like open office though rather than presenting a dropdown menu the remainder of the word would be highlighted and you could up-arrow or down-arrow to cycle through the list. It's fun to see how quickly you can narrow down a given result set.
see http://johnbokma.com/perl/google-suggest.html
Perl Programmer for hire
Even if they aren't in the list, it would definitely help in 'searching' for them. Adding to their 5424000000000000..5424999999999999 spanning problem which will be pumped up combined with this.
Anyway, didn't take me long to find a site listing some people's credit card number/exp date/security #/address using suggest.
Hi guys, I just finished implementing Google suggest for a dictionary database. http://www.objectgraph.com/dictionary The code is clean and you could see it by using "View Source" The dictionary database is on an SQL server (total of 18000+ words) with an index on the word column.
The poster I replied to was talking about the former, not the latter. Read the thread again.
I would bet they are using some sort of NN classifier or Nearest Neighbor classifier. If not then I would be it is some sort of factoring (Unsupervised Learning) method. They usually represent each document as a bag of words model which means that the frequency of each word is used to represent the document. Then the Nearest Neighbor or factoring algorithm is used to do the list. Then I would bet that the lists are cached or the clusters are cached in memory. I saw a lecture on this once from a guy who works at google.