Google Suggest Dissected
sammykrupa writes "Google suggest Javascript code dissected and rewritten for all of you web developers out there. Cool piece of web reverse-engineering!" Joel Spolsky astutely notes that this will raise the bar in terms of how people expect the "internets" to work.
I think it is such a great idea. With google suggest people can find things with less strife. The way it works is that you start typing and it suggests things for you to search for. These entrys pop up directly under the search bar. I can help when that brain just isn't working to full potential!
Dyslexic users of slashdot, rejoice!
This really rose the bar? I don't see how.
/sarcasm
Duh, its google-related. It must be better than everything else, thus raising the bar.
Clearly.
Let's think if the way people search for stuff.
1. Try something specific
2. Try something less specific
Number 1. brings up no results on Goggle Suggest, number 2. brings up 523,334 results. Impressive, but how has this helped us search for 1. ?
Let's try an example, lets look for "C# structs"
1. Enter "C# structs" - no suggestions.
2. Enter "structs" - 425,000 results.
Grrreat.
"It's not your information. It's information about you" - John Ford, Vice President, Equifax
Here's what he was talking about:
Google with Auto Complete on Just start typing in the search field.
It's a beta feature.
I don't know how happy google is about this, but there is already a FF extension to put suggest in the toolbar. Great plugin and also amazing how fast somebody implemented it!
Raising the bar as in people will expect computers to start intelligently assissting them when the are trying to figure something out. Not Clippy style, I mean assisst you as in being useful. In addition to that, it won't only be expected for native applications but also for web services. The nice thing is, this should ease people into the mentality that its okay for computers to help you.Some people still are freaked out by that.
Regards,
Steve
Unfortunately Google Suggest has really no use. If you know what you want to search for, you search for it. Suggesting search terms isn't really going to do anything apart from distract you. Hopefully this technology will be used for other things where it actually IS useful.
Eventhough it's an M$ spawned horror - It has brought a new revolution to javascript. Now it can load data from the server without having to refresh the screen. Flash has an XmlSocket , but I never see anyone use it till now (pointers please).
:)
Eventhough Google suggest looks great, I'd vote on CGI::IRC as the biggest killer HTML/Javascript browser app.
Clientside Javascript is powerful, we never realized how much
Quidquid latine dictum sit, altum videtur
Google will stop my cell phone from working? Noooooo!
While I'm very impressed with the javascipting behind this and indeed the speed of return from Google's network, I really don't see why it is being treated as revolutionary.
It could potentually save a user some time, but could equally slow down their search by confusing with a multitude of options.
Try typing "porn" or "sex" or "cock" into Google Suggest. It doesn't come up with anything. I started to get suspicious when I typed the letter x to see what would come up, and got 4 or 5 variations of "xbox" but not a single "xxx" or "xxx porn" or anything.
Interestingly enough, they DIDN'T censor the racial slurs. "gay nigger" happily suggests "gay niggers from outer space" among other things. Also, type "tub" and one of the suggestions is "tubgirl".
Hear recorded Slashdot headlines on your phone! New service beta testing. Just call (248) 434-5508
Google suggest is a neat idea, but a potentially destructive one.
Small sites should *not* try to do this kind of thing on a live site. The amount of pressure this could put on a bad database structure (or even a well formed one) is considerable. Think about how many database hits a user could perform in a very short space of time: (user enters something, (database hit) backspace (database hit) types another letter (database hit)), then multiply it by a hundred or more people if your site gets a moderate amount of traffic.
Google can get away with this because they have considerable bandwidth, and large server farms. We've been seeing people trying to copy google suggest for the last couple of weeks in #javascript/freenode and in #php/freenode. The people trying to copy it generally do not understand how potentially bad this can be for a single server.
Anyhow, my advice is, don't do it unless you have the resources to scale your site. The cost of such an insignificant feature (lets face it, all it does is save the user one or two clicks) seems like it outweighs the gain. If you do decide to do it, and your site gets popular, and you're on some kind of shared host, your sysadmin is going to hate you, and the other site admins will probably meet you at your house, torches in hand.
BeauHD. Worst editor since kdawson.
LiveSearch does something very similar, is Open Source and exists since April ;)
If you look for more XMLHTTPRequest examples, which tightly integrate JS and PHP (other server side languages would be possible), see JPSpan.
I don't quite understand all the hype about Google Suggests. The technique for doing it exists since at least 2 years on Mozilla (and even longer on IE). Therefore, doing something like that was possible since a long time, but maybe everyone was just scared of using JS for "serious" stuff..
People know when they're sitting behind copious bandwidth. And you could well grow accustomed to an all-text page weighing the better part of a megabyte, due to a heinous amount of information parked in hidden JavaScript data structures, giving you that near-whiplash inducing responsiveness.
In fairness, Google Suggest, like Gmail, works very nicely for me on a 56k dialup. Gmail takes a few seconds for its inital load, true, but then it's like lightning. Suggest doesn't even have the slow initial load, since webhp.htm comes in at only 3.6kB. I'm very impressed.
Now I've no doubt that the bandwagon will bring us massive slow bloat as everyone gets his dog to code up vaguely similar functionality, but Google haven't done that.
Not to dismiss the neat reverse engineering he did, but is the actual discovery that big a deal? It's just a keypress handler, and some server communication. No big deal on any graphical user interface other than a web page.
Google have good UIs because they hire smart people. Other people don't because they don't hire smart people, or hire the wrong type of smarts (graphic designer instead of sw engineer for the coding part of a website, and vice versa).
I've looked at using the XMLHTTP object a couple of times in the past, and noted that this is partly how Google Suggest works.
XMLHTTP is a COM object included with recent versions of Internet Explorer. You can call it from client side JavaScript in a web page. The object will make a request to the URL you specify, and return the result into either a string variable, or an MSXML DOM object. You can then have the javascript output the results to an object (eg, a div tag) on the page without doing a full page reload.
I wrote a small tech demo that implemented a virtual tree - so when you expand a branch in the tree the client only retrieved the data it needed. This was borrowed from the approach the MSDN web site uses. The advantages to it are that it doesn't download the same data over and over like when you expand a branch in a server side tree. You also don't have to do any work at all to remember the state of the tree since there's no full page refreshes involved.
Google Suggest is similar in that it is a virtual list rather than a virtual tree. A virtual list allows you to list lots of items and jump around in the list without needing to download the entire data set when the page was loaded.
Another use for this would be dynamic forms - forms that alter the state of controls based on selections the user made in previous controls.
The biggest suprise to me was that Google have implemented this on a site live to the public. In using XMLHTTP I found it a little bit prone to locking up the browser when waiting for responses to requests. Additionally it's Windows only, so could never have been implemented on an external web site.
I'll be looking with interest at the Mozilla side of Google's implementation, since I didn't think an equivalent existed until now. Two different implementations of the same functionality is still going put a damper on the technology though.. different code for different browsers is usually more trouble than its worth.
What you say might be true for us geeks, but have you ever seen how standard users do web searches? They begin with one-word searches, and if and only if the results don't satisfy them do they refine their search.
Engage!
1. Google performs several possible searches for each key you press
2. Google already knows the estimated number of results for millions of queries
Both of these suggest a heck of a lot of computing power. This type of thing might not scale up for general use in the near future - but still...
we're talking massive computational power and one of the largest databases ever created.
I'm a bit worried the Googleplex is going to wake up one day and declare to all us 'organics':
"yo bitches - you work for me now"
I fear that might be the case. I learned to code HTML and to put a decent webpage, designed the way I wanted it, online with relative ease, at the age of 14. It took time to learn it, but it was fairly straightforward - I wanted a large header in Verdana, I put in "FONT FACE" and "H1" tags, I wanted a table with a specific background color, I put in a "BGCOLOR" etc.
Today, we have two languages (XHTML and CSS) instead of one (HTML), and while it certainly does a lot to improve interoperability and platform independence, it is two languages to learn, not one. Throw in stuff like JavaScript, and you have even more.
Of course one can choose not to use XHTML and CSS, but that's not the way we want it, right? We want people to use the standards, to write code which won't crash Firefox, or not use proprietary solutions. Doing this is taking more and more effort. We have the skills and time to do and learn this, but not everyone have.
If we want a wide adoption of standards, and an Internet for everyone, where everyone has equal opportunities, the only way is to make the standards easy to use, so people will use them of their own free will.
Otherwise, in 10 years we'll be designing our fancy webpages, while the Joe Users who don't have the time or skills to learn the 13 languages required have no choice but to hire a professional, or use a crappy proprietary solution which won't allow them to take their ideas to their full potential, and this is a great loss for everyone.
Saying "You must do *complicated thing* because it's the specified standard!" will only work with people like us.
We have something called the disability discrimination act here in the UK, which pretty much rules out many interesting uses of Javascript if things like screen readers can't process them, and if there's no other way of providing that enhanced functionality to disabled users.
As others have commented here, I'm not convinced that the Google feature is in fact much more than eye-candy; and thus, since it doesn't really add any functionality, isn't really covered by the DDA. However, as soon as it actually becomes useful for something, then it will be covered; and I don't fancy the job of getting JAWS or something like that to interpret the JS in a meaningful way!
That differs from the well known "nothing happens till you hit the send button paradigm". So beware of type in your passwords by accident. They read everything (and turn it to statistics).
Dyslexic users of slashdot, untie! :)
Wouldn't Amazon or eBay make more use of this technology? Google will give you results for almost anything, and as such I don't think this technology is as useful as it would be for a more limited (but still massive) database like Amazon or eBay.
Mozilla has had this (IE compatible) object since Mozilla 1.0 (Netscape 7), and Safari has it too. In the native implementations, you use new XMLHttpRequest() instead, and you can test for window.XMLHttpRequest to see if it is there. It is just a few lines of code extra.
;)
Furthermore, you can use asyncronous requests to avoid lockups. Having the Google server farm and bandwidth wouldn't hurt either, of course.
Spine World
After seeing google suggest, I built the same thing last weekend for CPAN modules. It's at http://teknikill.net/cpan/
The next thing I need to do is include the value of the dropdown box and limit the results on that.
This makes for an interesting way to sum up the internet into 26 words/phrases.
Check it out:
A - Amazon
B - Best Buy
C - CNN
D - Dictionary
E - eBay
F - FireFox
G - Games
H - Hotmail
I - Ikea
J - Jokes
K - Kazaa
L - Lyrics
M - Mapquest
N - News
O - Online Dictionary
P - Paris Hilton
Q - Quotes
R - Recipes
S - Spybot
T - Tara Reid
U - UPS
V - Verizon
W - Weather
X - XBox
Y - Yahoo
Z - Zip Codes
If I had to sum up the internet in 26 words/phrases, I don't think I could have done it better than Google. Of course, that is keeping in mind that Google Suggest has some pretty serious filters in place, so instead of P being "Porn" it is "Paris Hilton." Not too far off, if you think about it.
I view Google Search as a great interface enhancement for the tired browser paradigm.
Everybody is writing apps to work in the ubiquitous browser. Unfortunately, developers have to jump through many hoops to get browsers to sport friendly interface elements that are already available in the X / Windows / Mac interfaces. The browser was never meant to be an application front-end, but it's being forced upon us developers, costing more time to get a workable product banged out. A compiler and class libs can do so much more.
Thanks, Google, for adding another element to make people want the pathetic browser as an interface.
Mod me a troll, but browsers suck as interfaces. And I haven't even touched on printing...
XmlHttpRequest to fetch data on demand has been around for a long time. For example, MSDN has been using this technique for years now. I have been using it for 9+ months on an application that recently went into production.
The reason you have not seen it in use much is
Google's best engineering continues to be in the back end - that is what makes this thing possible, and why no one else would likely be able to replicate this. The ability to search billions of records that fast is simply staggaring.
Wolf5K is a Javascript clone of Wolf3D in 5Kbytes. I deobfuscated it and posted a series of tutorials on how it works here. There is also a C++ translation and enhancment series of tutorials here. Full ready to compile source is included for all tutorials.
The task of deobfuscating code is quite tedius but not too daunting. The main thing is getting the whitespace back in so you can see where all the functions begin and end. You then have to understand the language well enough that you can read the code and figure out what's going on without hints from comments or descriptive variables.
For Wolf5K I just started by working on the simple functions first and then by process of elimination worked my way through the code and finished with the raycasting function.
Translating it all to C++ was then quite easy because by then you have a very good grasp of how the code is suppost to work.
Work Safe Porn
*cringe*
This works just fine until something doesn't work *perfectly*, and then all hell breaks loose. I will give you a real world example. I'm currently working at a law firm (I'm starting law school in either fall 05 or 06) that uses a common "indstry standard" database tool -- it is a flat-file DB that uses B+ trees. [B trees are like binary trees, but they have many children instead of two, and B+ trees store information only in the leaves.] The idea between B+ trees is that because of the high degree of branching of the tree, you should never have to take more than 2-3 "slow memory" accesses to find your page. (i.e. the entire first node lives in memory. assuming a branching factor of 256, 16777216 records can be accessed within 3 accesses.) Building these trees is also a time-intensive process since there are a lot of writes that happen to parent nodes, and it is very likely that pages get flushed from virtual memory. The problem is that no one has a CS background and so no one understands the memory heirarchy, virtual memory, caching, write-on-update, LRU/MRU page replacement, et cetera, so when Concordance is *slow as all hell* -- no one knows why. [The answer is: When indexing a large database, the programmers seem to have been sloppy and the main node spills over onto a second memory page. Once other nodes begin to spill over, you get a case of "thrashing" in which every time your computer pulls a node back into the "working set" of what lives in physical memory, it kicks what you need out of virtual memory. Google for "thrashing" and the "row-major" and "column-major" order problem.]
*My* firm took a huge risk and hired someone with a CS degree (masters) rather than a paralegal, and they did some experimenting. I've gotten under-the-hood of many of their apps, and the things I've discovered have been shocking. (And these are industry "standard" solutions.) They've reaped the benefits of having someone that actually understands the underlying technology. Here is the archetypical example:
BUT: I am giving you a secret peek at the innards of foo!
Very long story short: at some level, there must be someone technical so when things "go wrong" (like why many people accessing a shared harddrive over ethernet for disk intensive operations is a bad idea due to the nature of a bus architecture...) all hell doesn't break loose.
When in doubt, parenthesize. At the very least it will let some poor schmuck bounce on the % key in vi. (Larry Wall)
Therefore, at present, this works only for English; with other languages it can happen that it suggests porn-prone search terms for the refinement of terms that have, as such, nothing to do with pornography. Some examples:
- the first suggestion for 'fille' (French for 'girl') is 'nue' (naked)
- the 5th suggestion for 'dzieci' (Polish for 'children') is 'nago' (naked)
- suggestions for 'mund' (German for 'mouth') countain 'mund auf sperma rein' (open mouth, introduce sperms), 'mund ficken' (fuck in the mouth), "mund arsch" (mouth ass)
- devochki (with Cyrillic letters: Russian for "little girls") gives the suggestions "devochki porno"
- the first suggestion for 'smot...' with Cyrillic letters (smotret': Russian for 'watch'/'look at') is "smotret' porno"
I think this is probably quite problematic - someone enters a search term that has nothing to do with pornography, and Google suggests something pornographic for 'refinement'. Of course, this is not due to Google's intent, but due to the distribution of the things people search for and of contents on the Internet. I suppose this is one of the problems Google will want to address before offering Suggest as an option on the main page.Just about all of the highly modded comments seem to be complaining about how Google Suggest is not very useful.
But that is not what the story is about. The story is really about all the little things that are going on that make a very usable and responsive web interface.
Others have noted the XMLHTTPRequest object at work. But there are a number of other cool things in there:
Replacing XMLHTTPRequest with a cookie/frame reloading technique.
Using javascripts Timeout() handler to initiate server communication, so that fast typers are not penalized with a lot of excess network requests.
Interesting JavaScript text manipulation (like highlighting).
Basically, just a lot of little things that show how to make some interesting techniques useful for the widest audience possible. Google Suggest may on the face of it not look like the most useful thing ever, but you have to respect the sheer number of browsers it is designed to work on and the responsiveness of the interface.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
unfortunately, "google suggest" is not as good as it could be.
;). so i start by typing "sou". after a short delay, google suggests "southwest airlines". ok, this seems to be what most people are searching for when entering "sou". luckily, "southwest" is the second most common suggestion listed in the drop-down list, so i just hit 'cursor-down' and 'enter' to autocomplete and search for "southwest". everything ok so far.
;)
;)
;). sure, there are people who are interested in the set union and not the intersection.. all they need is hitting backspace accordingly.
why? valuable implicit information gained through the human-computer interaction is not fully exploited by "google suggest". for illustration, see the following example:
let's say i'm searching for "southwest". and for the sake of logic, let's assume that i either don't know the correct spelling or that i'm a lazy dog
now comes the problem:
the top result displayed by google is.. southwest airlines! this of course doesn't make sense because if i wanted to search for southwest airlines, i would have happily accepted google's first suggestion already. actually, "google suggest" knows about my preference for "southwest" over "southwest airlines" and yet doesn't use this "extra-"information gained thanks to human-computer interaction! so my brain feels slightly offended
to put it simply: if an average user is selecting a search term from a list of suggested search terms, he probably wants to search for that exact search term but not for any of the other also displayed suggested search terms. if not, an average user would have probably selected another search term out of the displayed list of suggestions. so to me, this looks like if the bright google guys forgot about the fact that the act of selection from a list also implicitly includes information about what does not get selected.
suggestion for a better "google suggest":
as a probably not perfect but working solution, "google suggest" could simply exploit this implicit user interaction information by excluding all explicitly deselected (and eventually all not explicitly selected) suggested search terms from the search query. in the example:
excluding all explicitly deselected search terms yields:
southwest -"southwest airlines" (voilà! southwest airlines is not the top result anymore
excluding all explicitly deselected and all not explicitly selected search terms:
southwest -"southwest airlines" -"soulseek" -"south park" (etc.. you get the point)
that's pretty easy to implement - with an obvious benefit for average users.
disclaimer: i'm talking about expectations of average users here. iow: about users that are probably just interested in the few topmost results, i.e. the intersection and not the set union of results (but that's probably the point of web searching anyway
So beware of type in your passwords by accident. They read everything (and turn it to statistics).
Somehow, I doubt someone at Google sees the search term "vZ820aa3q" and thinks "oh, that's mrmorgana's Slashdot password"...