Digging Holes in Google
Kurt LoVerde writes "Though google has become synonymous with searching, the folks over at MSN have written up an interesting article on our favorite search engine's pitfalls. Included among these are a tendency to skew results toward shopping, a lack of diversity for searches containing synonyms and its impact on research."
Real links on the www.msn.com home page today:
And they show their socialist bent away from shopping with their hardhitting piece:
I'm much funnier now that I'm a subscriber.
Wow, sounds like those Google guys AND the MSN guys are biased. For shame! We at Slashdot poke fun at you!
Not that MSN doesn't have a vested interest in some other search engine or anything.
Aren't microsoft on the verge of releasing their googleslaying search engine (or perhaps just search marketing) on the world.
How nice on an impartial journalistic source to pick holes in google which are almost certainly specific areas which microsoft has chosen to optimise.
We tend to forget that:
1. Just because it's not found on the Internet, does not mean that it doesn't exist.
2. Just because it's found on the Internet, does not necessarily make it true.
.sig
In case it gets slashdotted, here's the google cache of the page (heh)
e =UTF-8&oe=UTF-8&q=cache:http%3A%2F%2Fslate.msn.com %2Fid%2F2085668%2F
http://www.google.com/search?sourceid=navclient&i
No Norm, those are your safety glasses; I'll wear my own thanks...
I won't say it. It's too obvious...
I'm the guy with the unpopular opinion
Well, since Microsoft has already announced plans to try to topple Google as a search engine, I'm pretty much going to take anything that they say with a grain of salt, if I don't just ignore them completely.
Google does an excellent job with their primary searches, their news siphoning, and their froogle.google.com service. I've found more useful results through Google than I have through all of the other search engines that I've used over the years combined. Sometime I'll have to try out their newsgroup tool.
Do not look into laser with remaining eye.
Gee... Microsoft complaining about the competition beating them? When does THAT ever happen?
This is my sig. There are many like it but this one is mine.
"look for apple and you have to troll through three pages of ... before you find apple computer ..."
Um, how about using more than one keyword?
"apple computer" brings www.apple.com as the FIRST link.
I imagine if I look in msn.com for "battery" I won't find detailed schematics of NiMH batteries either. Holy shit, are they paying these people to write this shit?
Heck, even in grade school when we had to use CD encyclopedia's we were taught to use more than one keyword.
Tom
Someday, I'll have a real sig.
Those are some pretty weak allegations.
The jist of the article is that if you give google a one (common) word search term, that the results may not be as precise as you want. For instance, if you want the nutritional content of an apple, and you put "apple" into Google, you're going to get a bunch of hits for things that don't have what you're looking for.
I'm sure a lot of you are saying "duh" right now.
I read the internet for the articles.
Maybe try searching with "flower gardening" next time.
word.
Can you spell F-U-D? I knew that you could.
I think we will see this go the way of MS Bob, "Trusted Computing", MSN as a popular ISP, etc.
You ever notice that with the exception of hardware, most people only use Microsoft products as they are forced to?
The only reason people in the outside world use it is because even they feel like they HAVE to use Word, Windows, Excel, etc. When it comes to options in other parts of life, most people recognize that MS sucks donkey balls.
(Interesting side note: I used to work for Blue Cross of California. This was back in the NT 4.0 days. BCC has a support contract with MS.
A co-worker of mine took an NT server, set it up professionally to interface with the network as a real server, then installed MS Bob on it.
He promptly called Microsoft for tech support, and used the "Bob" terminology: "Yes, I'm having trouble getting the IP stack up. I keep telling the dog inside the living room to connect to the outside world, but he just keeps barking at me and asking if I want to make a document." My friend played dumb, only referring to MS Bob jargon as if it were the operating system
Needless to say, the call lasted many hours while the MS tech tried to trouble-shoot the problem using as much of the MS Bob terminology as possible.)
JoeLinux
Mod me down if you must but this article is crap. Sure Google is going to give you several pages of links to Apple Computer when you search for 'apple' - that's the way the system is supposed to work. However, if you do a multi-word search for something specific - like 'kixtart audit software', you're going to start seeing success. A search for 'apple trees' finds the top four links pointing to great sites that each link to more sites on apples. Same thing for 'tulips' - 'planting tulips' brings up several relevant links within the top 10. Moral of the story is the same as it is everywhere - GIGO.
Remember when the only search engines were archie and Altavista (the old altavista.digital.com, not the "new" one.) Well I certainly do. Google was a quantum-leap improvement over any of them; spidering had been tried with other search engines, but Google made it work. While it certainly has gotten LOTS more commercialized since I first used it, it's still better than anything else out there. I just hope they can stay off of the slippery slope to being clogged with ads.
With the size and complexity of the Internet as we know it, single word search terms like "apple" are completely stupid. I think the reporter was just screwing around with Google and noticed that the publishing deadline was approaching. Sure, there are some unique words that make sense to use as a single term search, but anyone who has used a search engine for more than 3 seconds knows to qualify the search somehow.
As far as shopping results, that's the character of the web today. Lots of commercial interests. It takes money to maintain a web presence, no matter what Geocities tells you. Google is just presenting you with what it's got, really.
Finally, a lot more papers are published than books. It's not surprising that you don't get a lot more hits on book-printed resources.
This is more interesting as a statement on what the Internet has become, rather than what Google might be showing you while filtering other things out.
Unfortunately, computers can't read the minds of dumb people yet....so the rest of the world will need to settle with flowers -shop so that most pages they find are not shops... Searching for something as generic as 'flowers' is the same as searching for 'car'. We typically don't walk into a library anymore and know there is no place to buy flowers there. We know that we're in a world where the Internet is a portal to a) buying and b) information. (Might I add that I think most people buy flowers more often than they grow them?)
--<Mike>--
Dewey, what part of this looks like authorities should be involved?
Too many commercial sites - True, and I wouldn't be surprised if google didn't allow an option soon to limit sales sites. It's feasible, and they often rise to this sort of challenge (like they did with the blog horde).
Synonym problems - This is certainly not something MSN will help with. This is also easy to get around my a little massaging of the search engine - you just think of a word that would come up in the stuff you want to see and not the other. For the retarded, perhaps Google could dynamically suggest categories after searching (kind of how they suggest misspellings).
No books for scholarly research - this is such a small use (though I am admittedly among them). Furthermore, it's not that great a problem if journals come up preferentially - if your research cites mostly books, that's a problem anyway, as it probably means your research is not current. But again, this is a problem for such a slight proportion of the population.
Bottom line is that google will fix any big problems - just think of how many things might have been on that list 3 years ago that they've already fixed. Put it this way - I have more faith in google to deliver a great search engine than I do MS any day.
-Looking for a job as a materials chemist or multivariat
i think all of these "google-holes" are actually just the result of poor searching techniques on the part of the author.
also, when i need to find something on--damn i hate to say it--MSDN for work, i usually use google with the site:msdn.microsoft.com as the MS search engine is crap.
!(^((ri)|(mp))aa$)
Yet their claim is weakened by the fact that if I enter "flower research," suddenly I see very, very little related to shopping, but instead to the research I'm seeking.
It all depends on the search scheme. If the claim that Google is so heavily weighted towards marketing and shopping were true, then "flower research" would have led me to buy flowers.
I would also note that "flowers" on MSN.com returns:
The next comment I write will be ready soon, but subscribers can beat the rush and see it early!
Search for "apple" on Google, and you have to troll through a couple pages of results before you get anything not directly related to Apple Computer
In my mind this is not a flaw, but a feature. In fact I rely on this every day. Type in "Axis" and I go to the Apache site, not a page about WWII or math. Type in "Python" and I do no go to a page about snakes or comedy troupes.
Granted, the article does state that technophiles have skewed Google's results in my favor, but I am fully aware of this. If I did want to know about apples, for instance, I would use a search term of, say, "apple growing" (5th link down). If I want to know about the Axis powers in WWII, I would first enter "axis powers" (third link down).
It's not broken. Users must be aware of the Web's zeitgeist.
Seriously, I have a lot of respect for Google (it's my IE home), but it's pretty obvious that it only can access certain types of information. I think the MSN folks were just looking to poke holes in their rival with that comment about it skewing research. If you are doing a serious research project, you go where researchers from time immemorial have gone--the library.
Under capitalism man exploits man. Under communism it's the other way around.
Ofcourse Google is not the Oracle. Anybody who's seen the Matrix Reloaded knows that the Oracle was actually an evil program in cahoots with the machines.
Google is not like that. Google is more like...um...Trinity: smart, beautiful, intelligent and always there for you when you need it.
An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
That isn't an accident. In the early days on the USA, the dollar was pegged to the value of a Spanish gold coin. That Spanish coin could be broken into eight pieces to make smaller amounts - hence the term "pieces of eight". Each of the pieces was refered to as a "bit".
Eight bits made up a full coin, or a dollar. This wasn't lost of the people that coined the terminology of bit/byte.
I don't think that's a flaw it just makes good sense for their example, most of the people searching for flowers are looking for emergency flowers to send to their GF or mother. If someone wants to research flowers they should probably search for Botany?
Googlehole No. 2: Skewed Synonyms. Search for "apple" on Google, and you have to troll through a couple pages of results before you get anything not directly related to Apple Computer--and it's a page promoting a public TV show called Newton's Apple. After that it's all Mac-related links until Fiona Apple's home page.
Again, I think this more a result of what people tend to be looking for when searching for Apple, I would imagine that most people querying google using the single keyword "Apple" would be looking for the company. The average user wouldn't have a reason to search google for fruit. Using a one keyword query is not good enough if you want to criticize a search engine, search for Apple and Fruit will get you everything you need to known about the non-computer apples. If you want to by fresh Apples perhaps you should search for Fruit Store?
So, when you're doing research online, Google is implicitly pushing you toward information stored in articles and away from information stored in books.
Hasn't the web been doing that for years? Is this somehow google's fault? If publishers want to have the full text of their books available on the web for free, I'm sure the folks at Google would be happy to spider them.
I got this far in the article and couldn't take it anymore. The guy that wrote this article obviously doesn't know what he's talking about.
Obvious
/Obvious
Type in what you're looking for! Want info on growing apples? Search for - *gasp* 'growing apples'!!! Want apple computers? Search for 'apple computers'. If this doesn't get you what you want, refine your search.
Plain and simple FUD.
Given that, as many people here have already pointed out, Microsoft is readying/improving its own search offering, I think it's pretty plain that this is just an attempt by Slate/MSN/Microsoft to smear Google, using journalism or op/ed to do so.
Google isn't biased, as the article tries to make the case, the _web_ is biased, toward the technical (and unfortunately, towards blogs.) So those, will, of course, show up first. People don't publish complete books online, but they publish papers and articles by the droves. So, of course you're going to be pointed to that stuff first.
And frankly, anyone who types in "apple" into a search engine should know that they're going to get MANY very BROAD results. You need to be specific in your search. The more specific you are, the better results you're going to get.
Ed R.Zahurak
You know, oblivion keeps looking better every day.
So... yes, articles published in PDF format will be indexed, but if one is doing real research, one is probably conducting a comprehensive literature search (e.g, if one is a PhD). If one is a PhD, there is a growing volume of new data will be published online, but there are still important corpos of off line literature, both old and new.
If one is doing "research" on how to buy a new car, or "research" for one's fifth grade home work project, I suspect that PDF files are probably just fine as a source and that comprehensive literature searches are not necessary (but might still be useful).
The article states "Google is implicitly pushing you toward information stored in articles and away from information stored in books." More relevantly and accurately (and obviously), Google is pushing you towards information that is stored online. If one uses Google for research, one should understand that it is not the only tool available. If one uses Google as the only tool, well...
I think this is a vaguely interesting point that might have a lasting impression on the way online content is indexed/stored/made searchable. However, the more relevant issue here is that individuals need to learn how to search (as many have already pointed out in comments), search tools must be understood in the context of available tools and a sense of the data to be found must be developed (it does not need to be known in advance).
I also assume that the Amazon text searching of books story might put another spin on this.
Ya, you'd think they'd actually know how to use a search engine. "apple computer" will get you apple's site. "tulips gardening -florist" (w/o quotes will get you gardening tips for tulips. "dvdr880 review" will get you some reviews of the philips dvdr880.
8 80~FD-87
Out of curiosity I tried this same stuff they tried with google on http://search.msn.com. The "apple" search brought up only a few actual hits for apple with #1 being office depot, and #11 being apple.com, the "tulips" search brought up the first 5 links are to proflowers.com and 1800flowers etc... with the bottom half of the page being encyclopedias yadda yadda, the "dvdr880" search brought up every link to every tech store in the known universe, with only 2 that I could tell that had reviews of the model, one of them was ironically linked to this page http://www.google.dealtime.co.uk/xPC-Philips_DVDR
print "$insertThisWholeArticleIsATroll\n";
/* oops I accidentally made a comment, sorry */
It's very obvious that this guy is reaching here when he "Digs for Holes". It looks like instead of holes, he dug for BullS#!t, and found a lot of it.
Lets examine his points:
Googlehole No. 1: All Shopping, All the Time
Hrmm, lets see. I'm looking for a Home Theater Receiver today, so I type in: "Pioneer Receiver". and I get mostly links to places selling them. But lets say I want a manual for my receiver, so I type in "Pioneer Receiver manual", the results are much different. Thats what keywords are for...moron.
Googlehole No. 2: Skewed Synonyms.
If I type in apple, OF COURSE i'm going to get results about apple computers. Just add the word fruit to your search, if thats what you're looking for. Ask and ye shall receive, I dunno about you..but most of those links seems to be relating to fruit.
Googlehole No. 3: Book Learning
Ok, I won't blame this one on google, I think this is due to the internet as a whole. Why would I go to the library, when I can poke around on the internet for 20 mins...and get just as much research done. Give me a break.
These all may be obvious points, but I was bored...so I decided to point them out.
home of the original cupholder
Plus everyone seemed to miss this bit of the article:
You can't really hold Google responsible for these blind spots. Each of them is just a reflection of the way the Web has been organized by the millions who have contributed to its structure. But the existence of Googleholes suggests an important caveat to the Google-as-oracle rhetoric: Google may be the closest thing going to a vision of the "group mind," but that mind is shaped by the interests and habits of the people who create hypertext links. A group mind decides that Apple Computer is more relevant than the apples that you eat, but that group doesn't speak for everybody.
Which is a fair enough point. Sometimes what I'm looking for is not what Google thinks I'm looking for and I have to tailor my searches somewhat.
But if MS included an option to ignore certain sites (such as shopping, blogs etc.etc) then I'd take a look.
Avantslash - View Slashdot cleanly on your mobile phone.
Good grief people:
Font Arial -"Font=Arial" -"Face=Arial"
How hard was that? Refining a search too much trouble ? Gah, "apple" results in Apple Computer! Somebody sue! I can't possible be bothered to enter a second or third keyword, READ MY FREAKING MIND!
Bah, I'm in a grumpy karma burning "ask a librarian for 'apple' and see what they ask you back" mood I guess.
Sig under construction since 1998.
...about these Googleholes:
"I am getting flamed to high heaven in Slate's Fray for a piece of mine they just posted talking about some of the built-in limitations of the Google PageRank system. The general critique seems to be that I don't understand how to refine a search, which I guess I should have made clear in the piece itself. (I do, for the record. I also think Google is absolutely brilliant.) But as you can see if you follow the link, it's not a piece about how to use Google more effectively; it's a piece about ways that Google's system implicitly pushes us in certain directions, which makes it less like an authoritative reference source, and more like an op-ed page. (Nothing wrong with that, just something we should keep in mind.) Normally I quote from the articles themselves in this blog, but today I think I'll quote from a followup comment that I posted in the Fray..."
http://stevenberlinjohnson.com
You too can participate in the roast by finding his e-mail address on Google.
sarchasm: The gulf between the author of sarcastic wit and the person who doesn't get it.
ever tried searching for Linux on MSN? -- oddly enough the first link you get it to amazon.com andmentions "buying linux" -- the second seems to be alright, and the third is funny altogether: 3. Alternatives to Linux-Apache-MySQL-PHP Learn about the Microsoft alternatives and how to move to them from open source products. www.microsoft.com/serviceproviders/migration
...
After having read both the article, and the majority of high-ranking comments here, I must say article is more objective than the majority of comments. Google is not perfect and the article points out some shortcomings. Of course, they are a logical result given how google works, but it can still be argued that some results are less than optimal. Of course, by changing the query you can get better results, but it is also possible that a different page rank algorithm can give better results.
Why not instead discuss algorithms that would give apple, the fruit, the same relevance in search results as it has in most people's lives? If a search engine appeared that added that knowledge to its result ranking, Google would not be on top any longer.
Let's look at a more subtle aspect through:
Is this verification that Google is vulnerable to astroturfing? If you assume that half of all web pages with the term "apple" are talking about the computer company and the other half are referring to the fruit, then it seems like a search for the term "apple" should bring up about equal numbers of computer & fruit hits. The fact that most top hits are about the company instead of the fruit probably suggests that at least some of the "ballot stuffing" tricks that companies try to bring up their ranking are effective, even against Google's famed efforts to avoid being astroturfed.
This example is probably bogus -- the computer company seems to be more popular than the fruit, or at least there's more for internet users to say about it, so pagerank is probably doing it's job well here. But in other cases, where the commercial alternative isn't as famous as Apple Computer but it still ranks higher in Google searches than non-commercial alternatives, that probably says something about astroturfing.
That or it just reiterates that the web went commercial a long time ago. Take your pick...
DO NOT LEAVE IT IS NOT REAL
I agree with people here in that the points raised by the article are somewhat FUDful. However, I do have a MAJOR problem with Google.
I develop in Perl. If you've ever seen Perl code, (as I'm sure many here have,) you know {it=>"isn\'t"} @the=("most", "friendly"); of languages, syntactically. However, with Google, searching for information is a moot point. Try searching for "$|++" (Search Link). For those who don't want to click on the link, I'll tell you what happens: Google does nothing. That's because it doesn't accept punctuation.
This was particularly annoying when I wanted to do research on URThere's (awful) PDA: the @migo. When I searched for "@migo", I got lots of spanish sites, but nothing relative. Google had internally stripped my "@" symbol.
Granted, I will continue to use Google, as it is the most incredible search engine available right now, but because of these flaws, searching is severely limited.
char sig[120] = "\0"
'apple fruit'
I skimmed the first 5 pages and there was no mention of Apple computer.
"Apples" probably more what the user wanted.
-WolfWithoutAClause
"Gravity is only a theory, not a fact!"What's wrong with his analysis? Where do I start? First let me say that most of his statements are true. They just have no real merit for me.
1) Reference basis. In any scientific analysis you need a baseline. For example, if you wanted to compare the fuel economy of two vehicles, it would be good if you established that the baseline should be something like gasoline powered passenger cars. If you compared the gas consumption of a horse drawn carriage to a Ferrari F40, that's not valid. In this case, no reference baseline was established. He was comparing Google to nothing. What if all his gripes about Google were inherent to all search engines?
To make his points, he should at least have some sort of meaningful comparison between browsers: Well, Altavista doesn't do this but Google does . . . I think he omitted this part because the MSN search engine shows many of characteristics he complains about.
2) Testing methodology. When you test anything, each test has to be narrowly designed to test as few factors as possible, and the desired result has to be achievable. In the fuel economy example, it would be silly to complain how poor fuel economy is in a Ford Explorer if you used uranium as the fuel source.
It's ironic that the subtitle is Google may be our new god, but it's not omnipotent. He was testing the terms 'apple' and 'flowers', yet he was actually looking for 'growing apples' and 'gardening flowers'. But by searching on vague terms, he assured the test would fail. Additionally without a reference (see #1), we don't if this behavior is normal to search engines or just Google.
3) Objectivity. I don't need to elaborate on this.
Well, there's spam egg sausage and spam, that's not got much spam in it.