Digging Holes in Google
Kurt LoVerde writes "Though google has become synonymous with searching, the folks over at MSN have written up an interesting article on our favorite search engine's pitfalls. Included among these are a tendency to skew results toward shopping, a lack of diversity for searches containing synonyms and its impact on research."
Real links on the www.msn.com home page today:
And they show their socialist bent away from shopping with their hardhitting piece:
I'm much funnier now that I'm a subscriber.
Wow, sounds like those Google guys AND the MSN guys are biased. For shame! We at Slashdot poke fun at you!
Not that MSN doesn't have a vested interest in some other search engine or anything.
"If a quarter is two bits, then a dollar's a byte." -R Deric Miller
Aren't microsoft on the verge of releasing their googleslaying search engine (or perhaps just search marketing) on the world.
How nice on an impartial journalistic source to pick holes in google which are almost certainly specific areas which microsoft has chosen to optimise.
Hey! You got a permit for that?
We tend to forget that:
1. Just because it's not found on the Internet, does not mean that it doesn't exist.
2. Just because it's found on the Internet, does not necessarily make it true.
.sig
In case it gets slashdotted, here's the google cache of the page (heh)
e =UTF-8&oe=UTF-8&q=cache:http%3A%2F%2Fslate.msn.com %2Fid%2F2085668%2F
http://www.google.com/search?sourceid=navclient&i
No Norm, those are your safety glasses; I'll wear my own thanks...
I won't say it. It's too obvious...
I'm the guy with the unpopular opinion
Well, since Microsoft has already announced plans to try to topple Google as a search engine, I'm pretty much going to take anything that they say with a grain of salt, if I don't just ignore them completely.
Google does an excellent job with their primary searches, their news siphoning, and their froogle.google.com service. I've found more useful results through Google than I have through all of the other search engines that I've used over the years combined. Sometime I'll have to try out their newsgroup tool.
Do not look into laser with remaining eye.
One of the flaws is a seeming preference towards articles rather than books. Hopefully Amazon's upcoming book text search will fill the gap left by google's seeming inability to find results from books.
FoundNews.com - get paid to blog.,
Gee... Microsoft complaining about the competition beating them? When does THAT ever happen?
This is my sig. There are many like it but this one is mine.
And we're supposed to believe that MSN Search's results won't be skewed toward MSN Shopping and MSN Content? Google may not be perfect, but at least it's independent of the industry giants. For now anyway.
It does seem true that when you are looking for information on a product, that 9 out of every 10 results will be a page trying to sell you the item, not a page with useful information about the product (beyond the normal marketing speak you get when someone is trying to sell you something).
"look for apple and you have to troll through three pages of ... before you find apple computer ..."
Um, how about using more than one keyword?
"apple computer" brings www.apple.com as the FIRST link.
I imagine if I look in msn.com for "battery" I won't find detailed schematics of NiMH batteries either. Holy shit, are they paying these people to write this shit?
Heck, even in grade school when we had to use CD encyclopedia's we were taught to use more than one keyword.
Tom
Someday, I'll have a real sig.
Those are some pretty weak allegations.
The jist of the article is that if you give google a one (common) word search term, that the results may not be as precise as you want. For instance, if you want the nutritional content of an apple, and you put "apple" into Google, you're going to get a bunch of hits for things that don't have what you're looking for.
I'm sure a lot of you are saying "duh" right now.
I read the internet for the articles.
Hmmm.... seems to me the author of the article is not very good at searching for things. Typically, you want something more specific than "apple" in your search. If I looked for apple in the library, I am sure I would find a lot of things that I am not necessarily interested in.
You will never "find" time for anything. You must "make" it.
Maybe try searching with "flower gardening" next time.
word.
Can you spell F-U-D? I knew that you could.
I think we will see this go the way of MS Bob, "Trusted Computing", MSN as a popular ISP, etc.
You ever notice that with the exception of hardware, most people only use Microsoft products as they are forced to?
The only reason people in the outside world use it is because even they feel like they HAVE to use Word, Windows, Excel, etc. When it comes to options in other parts of life, most people recognize that MS sucks donkey balls.
(Interesting side note: I used to work for Blue Cross of California. This was back in the NT 4.0 days. BCC has a support contract with MS.
A co-worker of mine took an NT server, set it up professionally to interface with the network as a real server, then installed MS Bob on it.
He promptly called Microsoft for tech support, and used the "Bob" terminology: "Yes, I'm having trouble getting the IP stack up. I keep telling the dog inside the living room to connect to the outside world, but he just keeps barking at me and asking if I want to make a document." My friend played dumb, only referring to MS Bob jargon as if it were the operating system
Needless to say, the call lasted many hours while the MS tech tried to trouble-shoot the problem using as much of the MS Bob terminology as possible.)
JoeLinux
Mod me down if you must but this article is crap. Sure Google is going to give you several pages of links to Apple Computer when you search for 'apple' - that's the way the system is supposed to work. However, if you do a multi-word search for something specific - like 'kixtart audit software', you're going to start seeing success. A search for 'apple trees' finds the top four links pointing to great sites that each link to more sites on apples. Same thing for 'tulips' - 'planting tulips' brings up several relevant links within the top 10. Moral of the story is the same as it is everywhere - GIGO.
Well, of course it doesn't! A search for apples, however, is much more useful.
This is just a case of user error, nothing more.
Remember when the only search engines were archie and Altavista (the old altavista.digital.com, not the "new" one.) Well I certainly do. Google was a quantum-leap improvement over any of them; spidering had been tried with other search engines, but Google made it work. While it certainly has gotten LOTS more commercialized since I first used it, it's still better than anything else out there. I just hope they can stay off of the slippery slope to being clogged with ads.
With the size and complexity of the Internet as we know it, single word search terms like "apple" are completely stupid. I think the reporter was just screwing around with Google and noticed that the publishing deadline was approaching. Sure, there are some unique words that make sense to use as a single term search, but anyone who has used a search engine for more than 3 seconds knows to qualify the search somehow.
As far as shopping results, that's the character of the web today. Lots of commercial interests. It takes money to maintain a web presence, no matter what Geocities tells you. Google is just presenting you with what it's got, really.
Finally, a lot more papers are published than books. It's not surprising that you don't get a lot more hits on book-printed resources.
This is more interesting as a statement on what the Internet has become, rather than what Google might be showing you while filtering other things out.
Unfortunately, computers can't read the minds of dumb people yet....so the rest of the world will need to settle with flowers -shop so that most pages they find are not shops... Searching for something as generic as 'flowers' is the same as searching for 'car'. We typically don't walk into a library anymore and know there is no place to buy flowers there. We know that we're in a world where the Internet is a portal to a) buying and b) information. (Might I add that I think most people buy flowers more often than they grow them?)
--<Mike>--
Apple
... Visit the Apple Store online or at retail locations. 1-800-MY-APPLE Find Job Opportunities
at Apple. Visit other Apple sites around the world: Choose... ...
Description: Apple's main homepage.
Category: Computers>Systems>Apple>Macintosh
www.apple.com/ - 18k - Jul 20, 2003 - Cached - Similar pages - Stock quotes: AAPL
Dewey, what part of this looks like authorities should be involved?
Just last night I was looking for information on whether the Arial font is trademarked by Microsoft (its not). Just try putting a font name into google. I hit on every page that had a font name="Arial" tag in it!
www.HearMySoulSpeak.com
Well, if you search for 'apple', don't be surprised when the first 50 links refer to the Apple Macintosh Computer in one way or another. The smart person does then exclude the terms 'compuer' and 'macintosh' from the search term, and voila, a usable result. The not-so-smart (msn) person calls google biased.
The same goes for the dvd player. Add 'review' to your search query.
And the same goes for the pdf thing. Just include 'book' in your search term.
I think the msn people didn't quite figure out how a search engine works.
...they'd know that to get better searches you narrow down, not generalize. The general approach of search engines of telling you to "generalize" your search terms is a poor approach to addressing their limited indexing--it's created searchers who don't realize that words have multiple definitions and that only their context gives us a clue as to which definition to use.
If this Salon author were a student of mine (not that I'm a teacher) I'd have slapped an F on his research methods paper.
What foolishness.
***Foucault is watching you..***
... set up a special Google site
... so that the author can find all the tulips and apples he wants.
www.google.com/botanist
Too many commercial sites - True, and I wouldn't be surprised if google didn't allow an option soon to limit sales sites. It's feasible, and they often rise to this sort of challenge (like they did with the blog horde).
Synonym problems - This is certainly not something MSN will help with. This is also easy to get around my a little massaging of the search engine - you just think of a word that would come up in the stuff you want to see and not the other. For the retarded, perhaps Google could dynamically suggest categories after searching (kind of how they suggest misspellings).
No books for scholarly research - this is such a small use (though I am admittedly among them). Furthermore, it's not that great a problem if journals come up preferentially - if your research cites mostly books, that's a problem anyway, as it probably means your research is not current. But again, this is a problem for such a slight proportion of the population.
Bottom line is that google will fix any big problems - just think of how many things might have been on that list 3 years ago that they've already fixed. Put it this way - I have more faith in google to deliver a great search engine than I do MS any day.
-Looking for a job as a materials chemist or multivariat
And all the shopping, paid links, popups, and crap that goes with it. Let them have their limited, corporate sponsored internet. Let them be spoonfed information, rather than blasting them with it and having them make an informed opinion or judgement. Let them have their MSN.
Just makes it easier for the aliens to take over, and until then, keeps them the hell away from me!
i think all of these "google-holes" are actually just the result of poor searching techniques on the part of the author.
also, when i need to find something on--damn i hate to say it--MSDN for work, i usually use google with the site:msdn.microsoft.com as the MS search engine is crap.
!(^((ri)|(mp))aa$)
paraphrased from one of the authors replies
"obvious if you look for steven you won't find specific people [spielberg in this case], that's the googlebias I am trying to demonstrate".
That isn't a bias. Its unspecific you nimrod.
A bias would be if you searched for "steven spielberg" and it returned links from only one website [e.g. a paid advertiser or something]. If you are just not specific enough you will get anything that matches.
The author should have pointed out searching for "steven" on search.msn.com also returns many random links such as Dell steven, msn steven, etc...
Tom
Someday, I'll have a real sig.
The fact that the pages are in the index, just further down is not an example that the program has blind spots, but rather if search on generic items it will bring the most likely search results based on your meagre help. If I take the author's example of 'apple' you see how weak an argument this is. If I simply use this single word then the results will be the most likely - apple computers. If I had a brain in my head I could do the following and get TOTALLY different results:
;)
"apple computers"
"apple records"
"apple trees"
If you want to do research on apples, then you better be doing more than typing 'apple' into google.
just = (My)Opinion.toCents();
Yet their claim is weakened by the fact that if I enter "flower research," suddenly I see very, very little related to shopping, but instead to the research I'm seeking.
It all depends on the search scheme. If the claim that Google is so heavily weighted towards marketing and shopping were true, then "flower research" would have led me to buy flowers.
I would also note that "flowers" on MSN.com returns:
The next comment I write will be ready soon, but subscribers can beat the rush and see it early!
Search for "apple" on Google, and you have to troll through a couple pages of results before you get anything not directly related to Apple Computer
In my mind this is not a flaw, but a feature. In fact I rely on this every day. Type in "Axis" and I go to the Apache site, not a page about WWII or math. Type in "Python" and I do no go to a page about snakes or comedy troupes.
Granted, the article does state that technophiles have skewed Google's results in my favor, but I am fully aware of this. If I did want to know about apples, for instance, I would use a search term of, say, "apple growing" (5th link down). If I want to know about the Axis powers in WWII, I would first enter "axis powers" (third link down).
It's not broken. Users must be aware of the Web's zeitgeist.
Seriously, I have a lot of respect for Google (it's my IE home), but it's pretty obvious that it only can access certain types of information. I think the MSN folks were just looking to poke holes in their rival with that comment about it skewing research. If you are doing a serious research project, you go where researchers from time immemorial have gone--the library.
Under capitalism man exploits man. Under communism it's the other way around.
Think about it folks. If you walked into something like a WalMart supercenter, went to the service desk, and said "tell me where I can find some nuts", the answr would be different if you are looking for peanuts, automotive hardware, home hardware, or ??.
For example, instead of using Apple, or even Apple Newton, searching with "Apple Newton PBS" still comes up with a couple paid links back to Apple Computer but most links point to the right place, referring to the PBS series.
And in terms of only indexing PDF articles well, I have news for the folks at Slate... there aren't that many complete books out there on PDF that would be as useful to researchers as the PDFS of the articles that make up the scholarly journals themselves. So again, this perceived weakness in Google is a problem in the broad-brush arena, not in reality.
The bigger problem for most small websites and Google is building up crediblility among a wider network of links in the first place, which is where the quality of the information and it's presentation are key. Repeat after me -- there is no shortcut to success on the WWW. Build something worthless, remain in obscurity. Build something good that has value, and we -- via Google, Yahoo, or whatever search engine you like -- will eventually come.
...Open Source isn't the only answer -- but it's almost always a better value than the alternatives...
Tried again 5 minutes later. Works now.
Ofcourse Google is not the Oracle. Anybody who's seen the Matrix Reloaded knows that the Oracle was actually an evil program in cahoots with the machines.
Google is not like that. Google is more like...um...Trinity: smart, beautiful, intelligent and always there for you when you need it.
An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
That isn't an accident. In the early days on the USA, the dollar was pegged to the value of a Spanish gold coin. That Spanish coin could be broken into eight pieces to make smaller amounts - hence the term "pieces of eight". Each of the pieces was refered to as a "bit".
Eight bits made up a full coin, or a dollar. This wasn't lost of the people that coined the terminology of bit/byte.
Part of the problem is that the author appears unable to massage google to find the results he wants.
For example, when I search for a specific model of DVD player, I may search for "dvd A7049-34 review" - I was looking for a review so I put it in my search. Even better, add -shop to the end of my search. Now pages with the word shop in them will get filtered out.
Want to find info on apple farms? search for 'apple -"apple computer" -macintosh' and you'll eliminate a lot of mac webpages from the search.
Sometimes, typing a question into google will get you where you want... "how does thing X work?" More often than not you'll find the answer on the first page, because people post to newsgroups, web forums, and the like with questions and you are (usually) not the first person to ask that question.
The key here is to remember that you can tell Google what you want to find AND what you DON'T want to find (just put a minus in front of the word.)
Natural != (nontoxic || beneficial)
A Google search for Arial seems to work pretty well today. You get links to places that sell Arial fonts, articles about choosing fonts, rants from people who don't like it, etc. It doesn't return a link to every page that uses it.
I don't think that's a flaw it just makes good sense for their example, most of the people searching for flowers are looking for emergency flowers to send to their GF or mother. If someone wants to research flowers they should probably search for Botany?
Googlehole No. 2: Skewed Synonyms. Search for "apple" on Google, and you have to troll through a couple pages of results before you get anything not directly related to Apple Computer--and it's a page promoting a public TV show called Newton's Apple. After that it's all Mac-related links until Fiona Apple's home page.
Again, I think this more a result of what people tend to be looking for when searching for Apple, I would imagine that most people querying google using the single keyword "Apple" would be looking for the company. The average user wouldn't have a reason to search google for fruit. Using a one keyword query is not good enough if you want to criticize a search engine, search for Apple and Fruit will get you everything you need to known about the non-computer apples. If you want to by fresh Apples perhaps you should search for Fruit Store?
So, when you're doing research online, Google is implicitly pushing you toward information stored in articles and away from information stored in books.
Hasn't the web been doing that for years? Is this somehow google's fault? If publishers want to have the full text of their books available on the web for free, I'm sure the folks at Google would be happy to spider them.
It seems like most of the authors complaints revolve around Google's search not reading his mind for exactly what he is searching for. He searches for 'apple' and it returns about bunch of articles about apples, apple growers, etc... but it's a long time before it returned something about 'Apple computers', or 'Fiona Apple'. Well maybe this jackass should have typed in 'Apple computers', or 'Fiona Apple' if he was interested in finding websites about them. What an idiot
I think MSN is wrong in listing those things as "shortfalls" of google. Many of those shortfalls are what many people think are good features of google. I like the fact that more pdfs show up in google, and I can view them directly without having to go to those websites.
When I search for something on google, what I expect to come up, comes up. If I expect shopping sites, they come up. If I expect game review sites, they come up. If I expect wacky news sites, they come up.
I'll never use MSN, mostly because of popups, they're microsoft, and also they try to sell their internet service almost as forcefully as AOL does. And they have the wrong idea of what people want from their search engines.
Google isn't perfect. It has drawbacks and it has built-in problems. But, it works. What more can you ask for from a search engine?
Also, I can never appreciate a company that uses multicolored butterflies as their logo, especially when said butterflies appear as men dressed in tights rollerskating around. I've had enough trauma in my life without being exposed to men in tights trying to sell me MSN and other Microsoft products. Ugh.
I still use Google quite a bit, but when Google gives me a mess that's hard to parse with subsearchs, I go to turbo10.com. Metasearch engine with clustering of topics much like Northern Lights had. It often gives me relevant links faster than Google does.
If you can't beat them, embrace and extend them.
I got this far in the article and couldn't take it anymore. The guy that wrote this article obviously doesn't know what he's talking about.
Obvious
/Obvious
Type in what you're looking for! Want info on growing apples? Search for - *gasp* 'growing apples'!!! Want apple computers? Search for 'apple computers'. If this doesn't get you what you want, refine your search.
Google is a very good search engine. But it still doesn't always get what exactly what I want. (I have a url from a coworker that is a GREAT description of UDP multicast that doesn't seem to be in googles top 50, and the url is significantly better than any of googles top hits.)
He found it with a different search engine. (teoma.com?/ about.com?). He uses more than one engine depending on what he's doing (He does use google too)
What I'm getting at is competetion is good. It forces companies to make better products because they know if they don't others are going to try too.
Other companies are working really hard at getting a better search engine. Don't expect google to be on top forever, because athough slashdot readers love google, they'll leave it quickly if something better comes along (remember altavista/hotbot/webcrawler etc.. )
In the end everyone wins.
Of course you're going to have this. These companies spend millions of dollars making sure their pages are coming to the top of EVERY search engine, including Google. If somebody is searching for information about apples, they will search for "apples" not "apple". And if you search for "apples" you get a page entitled "Apples & More" with the description "Learn all about apples, growing and using them...".
The same can be said for the stupid comparison to flowers. Of the idiots who search for "flowers" for information about a research paper (those who know nothing about flowers), they will soon see that "information about flowers" "flower biology" and "flower development" turn up more relevent and less commercial terms.
Of course, if I searched for "design" I'd get a trillion pages about web design. But what if I was searching for "interior design". That's completely different.
It'd be sad if people were this stupid. But the reality is that people know what they're looking for and how to find it. I give this article a -1 STUPID.
Kind regards, Devon H. O'Dell
Plain and simple FUD.
Given that, as many people here have already pointed out, Microsoft is readying/improving its own search offering, I think it's pretty plain that this is just an attempt by Slate/MSN/Microsoft to smear Google, using journalism or op/ed to do so.
Google isn't biased, as the article tries to make the case, the _web_ is biased, toward the technical (and unfortunately, towards blogs.) So those, will, of course, show up first. People don't publish complete books online, but they publish papers and articles by the droves. So, of course you're going to be pointed to that stuff first.
And frankly, anyone who types in "apple" into a search engine should know that they're going to get MANY very BROAD results. You need to be specific in your search. The more specific you are, the better results you're going to get.
Ed R.Zahurak
You know, oblivion keeps looking better every day.
To be a bit fair to the article, I kind of agree to the shopping part.
The other day , I was googling for a perticular recipe and all I got was a list of restaurants,No matter how i rephrased my search, All the first 50-60 results were links to restaurants, which serve that dish, but nothing of the recipe.
Belive me if it were any other search engine , I wouldn't have even bothered , as don't expect any thing but ads from non-google search engines
Similarly I was googling for some extra information about the song stargazer from rainbow, and all i got was lyrics pages or links to CD sell.
O.T. :- IF any one can provide me with a link to more information on stargazer i would be much obliged
for the last time people, I am "frodo from middle eaRTH", not "middle eaST".
... is how some people, even smart people, don't or can't get the hang of search engines.
You know, the ones on newsgroup and mailing lists who say "anyone know of a good BLAH?". Then someone whaps them with a cluestick (or rather, google link).
There just seem to be brain types or personality types that don't get it. Here's the rules I try to impart:
But there are still some, like the author of the article, on whom any of this is lost.
Let's review this statement. The author states that no one puts up books in PDF form, and therefore, Google doesn't search these books? How is this a problem? If the book isn't online in the first place, html/pdf/whatever format, then how can Google possibly give you links to that book? Google is not a search engine *AND* a library, it is a search engine. The article seems to imply that Google is taking away from something else. The only thing I can think of is using Google for research vs. traditional methods of research, i.e., library work. If that is the author's arguement, then they are about 4-6 years behind the times, because the switch has already happened, or at least it has amongst my generation.
I haven't done research in a library since 5th grade (11 years ago).
Honestly, think about it. A very significant portion of websites out there are trying to sell you something. (Just check out that banner at the top of this page.) So if you estimate (low, probably) that 50% of all websites are shopping sites, then its a good chance that for a search on any given topic, you'll probably end up with around 50% of the results trying to sell you what you're looking for.
This isn't Google's fault; it's the nature of the web today.
In some cases this be because there are just a shitload of shopping sites out there? For example, do a search for 'credit'. You get page after page of various domains from the same companies offering credit reports. Keep in mind, search engines are nothing but algorithms. Furthermore, the pagerank system that google uses takes into account how 'helpful' users found various sites, the number of times a word appears on the site, etc.
I for one don't see how any search engine can read the minds of an individual user. If I type in "apple", how is Google supposed to know what I mean? It's like going up to someone in the street and asking where the "restaurant" is. The first thing the person will ask is, "which one?". Or going to the car dealership and asking for the "car".
I think our society still has the mentality that computers should be able to do anything and everything and when they don't, something must be broken. People always complain that their computer doesn't do what they want it to do and so it must be flawed. We've all heard it..."web pages load to slowly, I can't find what I'm looking for, this program has too many menus", etc. Well, I would like my car to lift off and fly during rush hour, but it's not. Do I complain that my car is flawed? No, I just accept that there are limitations as to what certain things can do at present. Perhaps my car will one day be able to take off and fly much like someday a search engine will have a better idea of what you are looking for.
"Oh dear, she's stuck in an infinite loop and he's an idiot" -Prof. Farnsworth (Futurama)
well, a .. umm... friend of mine told me that when I sear.. when he searches for celebrities/supermodels on the internet, the first couple page are sites like altocelebs which gives you two or three links to other sites, but they themselves have almost no content on the star. It sucks to wade through two or three pages of search results linking to these kind of sites when I'm jacki...researching...when some other person that is not me is reasearching the trials and tribulations of a particular celebrity
Actually I will use google to find places to buy obscure items. If it did not return shopping sites I would lose this valuable search feature. If none of the major online retailers have what I am looking for, I just type it into google.
http://www.kubuntu.org/
So... yes, articles published in PDF format will be indexed, but if one is doing real research, one is probably conducting a comprehensive literature search (e.g, if one is a PhD). If one is a PhD, there is a growing volume of new data will be published online, but there are still important corpos of off line literature, both old and new.
If one is doing "research" on how to buy a new car, or "research" for one's fifth grade home work project, I suspect that PDF files are probably just fine as a source and that comprehensive literature searches are not necessary (but might still be useful).
The article states "Google is implicitly pushing you toward information stored in articles and away from information stored in books." More relevantly and accurately (and obviously), Google is pushing you towards information that is stored online. If one uses Google for research, one should understand that it is not the only tool available. If one uses Google as the only tool, well...
I think this is a vaguely interesting point that might have a lasting impression on the way online content is indexed/stored/made searchable. However, the more relevant issue here is that individuals need to learn how to search (as many have already pointed out in comments), search tools must be understood in the context of available tools and a sense of the data to be found must be developed (it does not need to be known in advance).
I also assume that the Amazon text searching of books story might put another spin on this.
While the point that more refined searches give you better results is true, that's not what the author's talking about. He's trying to tell you that Google is an aggregation of zeitgeist and how many links things have (link interdependence, which is Google's strength, also adds its own bias), and not necessarily their relevance to the 'real' world. An understanding of how Google might skew results is useful.
Here's his site: read the July 16th article. "You can make things less than equal by doing more refined searches, but that doesn't mean the skew isn't important."
I think this article is more than a little dumb. Effectively it's like saying that if you drive a car for 20 years without topping up the oil and the car fails, then it's the car's fault.
No. It's a tool, just like Google is a tool. To gain the greatest utility from a tool, you must learn how to use the tool properly. In this case it means not being utterly stupid and having at least some idea what you're going to search for.
Let's take point 2 of the article as an example. Who is going to type apple into google in the expectation of not getting 13.6 million hits? If you're going to search, what kind of apple are you looking for? Golden Delicious, Cox,whatever. This is the user's problem. If you're going to use a search engine, at least have some vague idea of what you want to look for before you start. However, in using the web as a reference one pitfall is the principle of provenance. PageRank is not enough in itself. I'm not entirely sure how the PAgeRank algorithm works ( and I'm fairly certain we're not going to get someone from google telling us either ;-) ) but it would be nice to allow for authorities to be defined, so that in the case of academic content, if an item appears in a certain source (like ACM journals, for example) then it is given higher weighting by the engine.
In addition, point 3 in the slate article is flawed. Google doesn't divert people from books. It is not the tool's fault if people do not choose to publish in that way. Also, if the New York Times choose to fence off their content, it's hardly Google's fault that it can't spider the stuff. Talk to the NY Times and ask them why they use the registration system in that way.
Let's just make this clear. Google spiders what is generally available. It is not the fault of google if content providers wish to publish in ways that may limit the scope of viewing. Google isn't perfect by any means but this article doesn't really say anything useful. And hey, don't MSN run an engine of their own too. What possible gain could they have from rubbishing Google? [Cynical? Me? How could you possibly think that...]
It's not you: I'm just this horrifically socially awkward with everybody.
Amazon.com Buy Linux software at the Amazon.com software store. www.amazon.com Introducing Linux Find the latest news and information on this operating system. tech.msn.com Alternatives to Linux-Apache-MySQL-PHP Learn about the Microsoft alternatives and how to move to them from open source products. www.microsoft.com/serviceproviders/migration ------- 'nough said.
Let's not forget that Applied Semantics, formerly Oingo, invented pretty good technology to perform meaning-based searching of the Internet. The Oingo site, now defunct, was actually pretty cool. You'd do a search on "apple" and it would offer you to refine the meaning of apple: Which kind? The fruit? The computer? Etc. I would not be surprised if Google were to integrate the technology they now own into the Google site, which would make MSN's article quite obsolete. Just wait for a year or so.
... or is the author of the article the bias one? Of course if I were to research flowers in terms of gardening or botany, I'd specify that. 'gardening flowers' or 'flowers life cycle'. He expects the search engine to read his mind when you write 'apple' and mean the fruit, not the computer brand? 'apple computers' and 'apple fruit' would be what I'd type into the search engine. Google's ranking means whatever is popular on the internet gets ranked up top (by whatever else links to it). If more people on the internet link to the apple website, than aunt dora's apple farm them so be it.
Yes.
It's very obvious that this guy is reaching here when he "Digs for Holes". It looks like instead of holes, he dug for BullS#!t, and found a lot of it.
Lets examine his points:
Googlehole No. 1: All Shopping, All the Time
Hrmm, lets see. I'm looking for a Home Theater Receiver today, so I type in: "Pioneer Receiver". and I get mostly links to places selling them. But lets say I want a manual for my receiver, so I type in "Pioneer Receiver manual", the results are much different. Thats what keywords are for...moron.
Googlehole No. 2: Skewed Synonyms.
If I type in apple, OF COURSE i'm going to get results about apple computers. Just add the word fruit to your search, if thats what you're looking for. Ask and ye shall receive, I dunno about you..but most of those links seems to be relating to fruit.
Googlehole No. 3: Book Learning
Ok, I won't blame this one on google, I think this is due to the internet as a whole. Why would I go to the library, when I can poke around on the internet for 20 mins...and get just as much research done. Give me a break.
These all may be obvious points, but I was bored...so I decided to point them out.
home of the original cupholder
Google is just another tool. Like any tool, it's not responsibile for the level of skill of the user. A tool may have all the whiz-bang capability in the world, but if the weilder is lacking in skill, then none of that function matters.
The MSN article was ridiculously lame. If you want to find DVD reviews, search on "DVD +review" and you'll get pages of them, starting with the very first page. In other words, in order to effectively use Google, or any search engine, one has to know how to construct the query. Expecting a single word search to discriminate down to the level of detail that any given person wants is hopelessly naive. Besides, Google has never made any prestence about having a commercial slant.
MSN's apparent expectation that google ought to accomodate those with enough skill to get on-line, but not much more than that, is just more of their corporate bias leaking through.
Apparently basic computer literacy isn't a requirement for doing technical reporting on MSN. The examples given are just silly.
They complain that a search for "flowers" mostly returns commercial sites, not information on gardening tips. What is Google supposed to do, read your mind to determine your real intention? "Flowers" might mean you want to buy some flowers, look at pictures of flowers, get information growing your own flowers, buy some flower seeds, or be looking for a company or person with the "Flowers" name. Useless. However, if you try the crazy idea of asking for what you want, amazingly you're much more likely to get the results you want. Interested in gardening tips? Don't search for "flowers", instead search for "gardening tips". Oooh, look, lots of useful links. Interesting in flower gardening tips specifically? Unsurprisingly, "flower gardening tips" returns a slightly different set of relevant links.
Searching for a product's model number doesn't return reviews? Again, if you want a review, maybe you should ask for a review. Sure enough, "apex ad1200" primarily returns places to buy the DVD player, but just adding review to the search term returns useful results. (Yes, Dealtime does jump to the top of the list, but that page does have several reviews on it.)
Oh no, search for "apple" doesn't return any information on artist Fiona Apple for many pages. Maybe you should actually search for "fiona apple"? Don't remember her first name? Try "apple female artist" or apple female musician" which return some good pointers (notably to her first name, which will return even better results.
"apple" doesn't get you information on the fruit? Well, step one is search refinement. Prior to Google people spent lots of time refining searches. Just because Google often does what you want doesn't mean you'll never need to refine your search. So, let's be a bit more specific. Let's try "apple fruit" Viola, hits on the fruit. Want to learn about growing apples? How about "growing apples" Wow, more good hits.
Google doesn't index sites with non-public archives (like the New York Times ? Well, duh. They also don't sneak into your house and index your tax returns. By requiring registration to access their archives, the New York Times has effectively declined to be indexed. Expecting Google to circumvent that decision is stupid.
On the subject of Google not magically indexing everything, we get to the extremely silly complaint that doing research using Google tends to steer people only to online sources, not books. Again, duh. Similarly, if you use your local library's card catalog (or more typically, online catalog), it will only return books and magazines, not web pages. The points to two things. First, if you're doing Real Research, limiting yourself to a single source (be it online or in a library) is just dumb. Second, the internet is rising in importance, and perhaps publishing books online is a good idea.
Google is doing so well that lots of people are interested in taking potshots at it. I'm all in favor of people challenging the status quo, but try to have some real complaints.
Search 2010 Gen Con events
Plus everyone seemed to miss this bit of the article:
You can't really hold Google responsible for these blind spots. Each of them is just a reflection of the way the Web has been organized by the millions who have contributed to its structure. But the existence of Googleholes suggests an important caveat to the Google-as-oracle rhetoric: Google may be the closest thing going to a vision of the "group mind," but that mind is shaped by the interests and habits of the people who create hypertext links. A group mind decides that Apple Computer is more relevant than the apples that you eat, but that group doesn't speak for everybody.
Which is a fair enough point. Sometimes what I'm looking for is not what Google thinks I'm looking for and I have to tailor my searches somewhat.
But if MS included an option to ignore certain sites (such as shopping, blogs etc.etc) then I'd take a look.
Avantslash - View Slashdot cleanly on your mobile phone.
Oh, nevermind.
The article does do a good job at pointing out possible improvements. For example the article mentions how biased the search can get towards particular trends on the web.
To workaround this, the folks who have worked in the field of Information Retrieval offer query refinement. An example of this can be seen at work with Teoma. Teoma offers to automatically refine your search query into narrower concepts it thinks are relevant to the original search. Type in "Jaguar" and it will return results as well as a box that suggests you could search for the car or the animal by modifying the query further.
Overall, I feel the article walks a thin line by associating Google with the flaws. On the other hand for the folks at Google, it has been 5 years now. Maybe the improvements are not coming along as I expected. In any case, their index rules. As far as the web is concerned, in my opinion they are the Oracle...
Santosh Dawara
Go to Google UK and enter keywords "weapons of mass destruction" and hit "I'm feeling lucky"
perl -e 'print $i=pack(c5, (41*2), sqrt(7056), (unpack(c,H)-2), oct(115), 10);'
...about these Googleholes:
"I am getting flamed to high heaven in Slate's Fray for a piece of mine they just posted talking about some of the built-in limitations of the Google PageRank system. The general critique seems to be that I don't understand how to refine a search, which I guess I should have made clear in the piece itself. (I do, for the record. I also think Google is absolutely brilliant.) But as you can see if you follow the link, it's not a piece about how to use Google more effectively; it's a piece about ways that Google's system implicitly pushes us in certain directions, which makes it less like an authoritative reference source, and more like an op-ed page. (Nothing wrong with that, just something we should keep in mind.) Normally I quote from the articles themselves in this blog, but today I think I'll quote from a followup comment that I posted in the Fray..."
http://stevenberlinjohnson.com
You too can participate in the roast by finding his e-mail address on Google.
sarchasm: The gulf between the author of sarcastic wit and the person who doesn't get it.
ever tried searching for Linux on MSN? -- oddly enough the first link you get it to amazon.com andmentions "buying linux" -- the second seems to be alright, and the third is funny altogether: 3. Alternatives to Linux-Apache-MySQL-PHP Learn about the Microsoft alternatives and how to move to them from open source products. www.microsoft.com/serviceproviders/migration
...
After having read both the article, and the majority of high-ranking comments here, I must say article is more objective than the majority of comments. Google is not perfect and the article points out some shortcomings. Of course, they are a logical result given how google works, but it can still be argued that some results are less than optimal. Of course, by changing the query you can get better results, but it is also possible that a different page rank algorithm can give better results.
Why not instead discuss algorithms that would give apple, the fruit, the same relevance in search results as it has in most people's lives? If a search engine appeared that added that knowledge to its result ranking, Google would not be on top any longer.
Let's look at a more subtle aspect through:
Is this verification that Google is vulnerable to astroturfing? If you assume that half of all web pages with the term "apple" are talking about the computer company and the other half are referring to the fruit, then it seems like a search for the term "apple" should bring up about equal numbers of computer & fruit hits. The fact that most top hits are about the company instead of the fruit probably suggests that at least some of the "ballot stuffing" tricks that companies try to bring up their ranking are effective, even against Google's famed efforts to avoid being astroturfed.
This example is probably bogus -- the computer company seems to be more popular than the fruit, or at least there's more for internet users to say about it, so pagerank is probably doing it's job well here. But in other cases, where the commercial alternative isn't as famous as Apple Computer but it still ranks higher in Google searches than non-commercial alternatives, that probably says something about astroturfing.
That or it just reiterates that the web went commercial a long time ago. Take your pick...
DO NOT LEAVE IT IS NOT REAL
Unfortunately, most of MSN's readers are very unlikely to understand why this article is nonsense. They will never read the reasoned counterpoints expressed in this article thread. Nor will they ever question it.
This article is exactly what the layperson craves. It's controversial and it makes some sense if you fail to do any deep thinking. The masses are going to gobble it all up (even when MSN is a demonstration of what this article complains: example) and look to other sources to save them from the newly created Google menance. To them, Google is now not only a bad search engine, it is also damaging the future of our species by negatively impacting research (*gasp!*).
Of course, this is how all Microsoft FUD plays out. It doesn't fool any of us, but it certainly fools most of them.
Join Tor today!
I agree with people here in that the points raised by the article are somewhat FUDful. However, I do have a MAJOR problem with Google.
I develop in Perl. If you've ever seen Perl code, (as I'm sure many here have,) you know {it=>"isn\'t"} @the=("most", "friendly"); of languages, syntactically. However, with Google, searching for information is a moot point. Try searching for "$|++" (Search Link). For those who don't want to click on the link, I'll tell you what happens: Google does nothing. That's because it doesn't accept punctuation.
This was particularly annoying when I wanted to do research on URThere's (awful) PDA: the @migo. When I searched for "@migo", I got lots of spanish sites, but nothing relative. Google had internally stripped my "@" symbol.
Granted, I will continue to use Google, as it is the most incredible search engine available right now, but because of these flaws, searching is severely limited.
char sig[120] = "\0"
'apple fruit'
I skimmed the first 5 pages and there was no mention of Apple computer.
isn't like 98% of what people are actually looking for "shopping-related"? I would venture "yes". The simple fact is that while google delivers lots of shopping-related stuff, it also delivers the real meat to anyone willing to think of the "right" words to enter. It's not that hard really, and anyone that thought that entering "apple" in a search engine would bring up their momma's apple orchard home page before apple computer has a lot to learn about the internet.
stuff |
Boo freaking hoo. Google isn't perfect. Whyever would MSN be interested in making sure we know it?
This reminds me of creationists pointing out gaps in our knowledge of evolutionary biology and concluding that lack of perfection in science proves that they are right.
If Slashdot were chemistry it would look like this:Cadaverine
"Apples" probably more what the user wanted.
-WolfWithoutAClause
"Gravity is only a theory, not a fact!"I was searching for "pirelli p3000 tires review" - there are four good links, followed by hundreds of obvious search engine spam. Doing "-sex" doesn't make much appreciable difference.
I liked Google better before it became popular and thus vulnerable to this kind of crap.
Comment removed based on user account deletion
What's wrong with his analysis? Where do I start? First let me say that most of his statements are true. They just have no real merit for me.
1) Reference basis. In any scientific analysis you need a baseline. For example, if you wanted to compare the fuel economy of two vehicles, it would be good if you established that the baseline should be something like gasoline powered passenger cars. If you compared the gas consumption of a horse drawn carriage to a Ferrari F40, that's not valid. In this case, no reference baseline was established. He was comparing Google to nothing. What if all his gripes about Google were inherent to all search engines?
To make his points, he should at least have some sort of meaningful comparison between browsers: Well, Altavista doesn't do this but Google does . . . I think he omitted this part because the MSN search engine shows many of characteristics he complains about.
2) Testing methodology. When you test anything, each test has to be narrowly designed to test as few factors as possible, and the desired result has to be achievable. In the fuel economy example, it would be silly to complain how poor fuel economy is in a Ford Explorer if you used uranium as the fuel source.
It's ironic that the subtitle is Google may be our new god, but it's not omnipotent. He was testing the terms 'apple' and 'flowers', yet he was actually looking for 'growing apples' and 'gardening flowers'. But by searching on vague terms, he assured the test would fail. Additionally without a reference (see #1), we don't if this behavior is normal to search engines or just Google.
3) Objectivity. I don't need to elaborate on this.
Well, there's spam egg sausage and spam, that's not got much spam in it.
Yeah, but "foo review" searches are getting less and less useful because of sites like DealTime. I was searching for hardware reviews recently, and I got sick of clicking through to sites that said something like:
"Foobar TWF-69: BUY NOW! 0 reviews posted, add your review"
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
Vivisimo is a meta search engine that does clustering.
When you search for apple, the first clusters are mostly computer-specific, too, but that simply corresponds to what is on the web (it is different both with Google and Vivisimo when you use apples instead).
But sometimes, the automatic clustering can speed up the search, you don't have to find out yourself which additional (positive or negative) criteria work best (you can, of course still add them if you want).
A few examples of the first few clusters with a few Vivisimo searches:
- apache: Project, Helicopter, Mod/Module, Apache Software, Resources, Native American, XML, Apache Tribe, New Mexico, Technology
- python: Monty Python, Language/Programming, Snake, Book, Ball Python, Active State, Resources/Tools, Python Scripts,
...
- palladium: Microsoft, Platinum, Photos, London Palladium, Element, PCPA/FAQ, Hotel Palladium Palace in Rome,
...
- blair: Tony Blair, Blair Which, Jayson Blair, Blair/Nebraska, Coupons, Clothing, Blair County,
...
I think such clusterings can be useful. Also interesting: Clustering of 2087 Microsoft patents (1996-present) (provided as a demonstration).