Digging Holes in Google
Kurt LoVerde writes "Though google has become synonymous with searching, the folks over at MSN have written up an interesting article on our favorite search engine's pitfalls. Included among these are a tendency to skew results toward shopping, a lack of diversity for searches containing synonyms and its impact on research."
Apple
... Visit the Apple Store online or at retail locations. 1-800-MY-APPLE Find Job Opportunities
at Apple. Visit other Apple sites around the world: Choose... ...
Description: Apple's main homepage.
Category: Computers>Systems>Apple>Macintosh
www.apple.com/ - 18k - Jul 20, 2003 - Cached - Similar pages - Stock quotes: AAPL
Just last night I was looking for information on whether the Arial font is trademarked by Microsoft (its not). Just try putting a font name into google. I hit on every page that had a font name="Arial" tag in it!
www.HearMySoulSpeak.com
Alright, let's try reading it again:
Search for "apple" on Google, and you have to troll through a couple pages of results before you get anything not directly related to Apple Computer
Hope that clarifies.
the blood has stopped pumping, and he's left to decay
the me that you know is now made up of wires
Tried again 5 minutes later. Works now.
That isn't an accident. In the early days on the USA, the dollar was pegged to the value of a Spanish gold coin. That Spanish coin could be broken into eight pieces to make smaller amounts - hence the term "pieces of eight". Each of the pieces was refered to as a "bit".
Eight bits made up a full coin, or a dollar. This wasn't lost of the people that coined the terminology of bit/byte.
Part of the problem is that the author appears unable to massage google to find the results he wants.
For example, when I search for a specific model of DVD player, I may search for "dvd A7049-34 review" - I was looking for a review so I put it in my search. Even better, add -shop to the end of my search. Now pages with the word shop in them will get filtered out.
Want to find info on apple farms? search for 'apple -"apple computer" -macintosh' and you'll eliminate a lot of mac webpages from the search.
Sometimes, typing a question into google will get you where you want... "how does thing X work?" More often than not you'll find the answer on the first page, because people post to newsgroups, web forums, and the like with questions and you are (usually) not the first person to ask that question.
The key here is to remember that you can tell Google what you want to find AND what you DON'T want to find (just put a minus in front of the word.)
Natural != (nontoxic || beneficial)
Ignoring all the obvious "FUD! FUD!" claims..
:o)
1) Nothing's perfect
2) Google is still be best to date
3) I'm still happy with my google search results. I can't begin to estimate how many times in work, play, and study that google has come through, again and again and again.
me 3 google
he says that he (surprise) got flamed for the article on Slate a few days ago (guess slashdot was slow in picking it up). Head over to his blog to see his response to many of the comments we're making about his inability to properly form search terms. He's also got a feedback section on there so we can flame him directly if we want.
/.ed:
I've pasted it below in case his blog gets
"I am getting flamed to high heaven in Slate's Fray for a piece of mine they just posted talking about some of the built-in limitations of the Google PageRank system. The general critique seems to be that I don't understand how to refine a search, which I guess I should have made clear in the piece itself. (I do, for the record. I also think Google is absolutely brilliant.) But as you can see if you follow the link, it's not a piece about how to use Google more effectively; it's a piece about ways that Google's system implicitly pushes us in certain directions, which makes it less like an authoritative reference source, and more like an op-ed page. (Nothing wrong with that, just something we should keep in mind.) Normally I quote from the articles themselves in this blog, but today I think I'll quote from a followup comment that I posted in the Fray:
The point I'm trying to make is that all other things being equal, Google will skew results towards online stores and pages linked to by the blogging community. (And away from books towards articles, though that's a slightly different point.) You can make things less than equal by doing more refined searches, but that doesn't mean the skew isn't important. This reminds me in a way of the old debate about Microsoft controlling the desktop -- the Microsoft folks would always say, "people can install their own application icons on the desktop so what's the big deal if our icons come as part of the default setup?" The point is that default biases in widely used tools have real effects, even if there are relatively easy ways around them.
Here's a more real-world example of the bias at work, which is equally self-reflexive: search on "steven johnson emergence." The top ten results are either from blogs, Amazon product pages, or the O'Reilly Network (very big with the open source and blogging communities.) Now, Emergence was reviewed by the NY Times, the Economist, the Village Voice, the UK Guardian, and dozens of other major publications with huge readerships. But Google doesn't think those results are as relevant as blogger reviews. Now, I'm a blogger, and I love the blogging community, so I think in a way that this is not necessarily bad news. But it's hard not to see it as a kind of bias."
Let's not forget that Applied Semantics, formerly Oingo, invented pretty good technology to perform meaning-based searching of the Internet. The Oingo site, now defunct, was actually pretty cool. You'd do a search on "apple" and it would offer you to refine the meaning of apple: Which kind? The fruit? The computer? Etc. I would not be surprised if Google were to integrate the technology they now own into the Google site, which would make MSN's article quite obsolete. Just wait for a year or so.
Ya, you'd think they'd actually know how to use a search engine. "apple computer" will get you apple's site. "tulips gardening -florist" (w/o quotes will get you gardening tips for tulips. "dvdr880 review" will get you some reviews of the philips dvdr880.
8 80~FD-87
Out of curiosity I tried this same stuff they tried with google on http://search.msn.com. The "apple" search brought up only a few actual hits for apple with #1 being office depot, and #11 being apple.com, the "tulips" search brought up the first 5 links are to proflowers.com and 1800flowers etc... with the bottom half of the page being encyclopedias yadda yadda, the "dvdr880" search brought up every link to every tech store in the known universe, with only 2 that I could tell that had reviews of the model, one of them was ironically linked to this page http://www.google.dealtime.co.uk/xPC-Philips_DVDR
print "$insertThisWholeArticleIsATroll\n";
/* oops I accidentally made a comment, sorry */
It's very obvious that this guy is reaching here when he "Digs for Holes". It looks like instead of holes, he dug for BullS#!t, and found a lot of it.
Lets examine his points:
Googlehole No. 1: All Shopping, All the Time
Hrmm, lets see. I'm looking for a Home Theater Receiver today, so I type in: "Pioneer Receiver". and I get mostly links to places selling them. But lets say I want a manual for my receiver, so I type in "Pioneer Receiver manual", the results are much different. Thats what keywords are for...moron.
Googlehole No. 2: Skewed Synonyms.
If I type in apple, OF COURSE i'm going to get results about apple computers. Just add the word fruit to your search, if thats what you're looking for. Ask and ye shall receive, I dunno about you..but most of those links seems to be relating to fruit.
Googlehole No. 3: Book Learning
Ok, I won't blame this one on google, I think this is due to the internet as a whole. Why would I go to the library, when I can poke around on the internet for 20 mins...and get just as much research done. Give me a break.
These all may be obvious points, but I was bored...so I decided to point them out.
home of the original cupholder
Err... maybe you should try searching for: "what you want to find out" -buy -commercial -shop -"some other terms involving exchange of money for goods and services"
Welley Corporation - SLM Scammers
Actually, the "flower -shop" term you gave still brings a lot of commercial links. At least the first and second pages for me was pretty full of them. So it should perhaps be refined to "flowers -order". That seems to bring up some nice results of flower information, since pretty much all commercial sites use the word "order" somewhere.
_ Garden/
t s/
l ora_and_Fauna/Plantae/
Some people (the article author?) also seem that Google Directory links from search results usually give excellent results, since it's a nicely organized database of major websites. If you *do* wish to look for flower shops, use this category:
http://directory.google.com/Top/Shopping/Home_and
Otherwise, use something like this one:
http://directory.google.com/Top/Home/Gardens/Plan
Or perhaps even this one, if you're going to dig deep, as in flower biology:
http://directory.google.com/Top/Science/Biology/F
These links took probably less than 2 minutes to find and is so much related information that it would take days to go through. The purpose of Google Directory *is* exactly that -- to separate categories (like shopping and non-shopping) and include the major sites.
Beware: In C++, your friends can see your privates!
Thanks, but in this context "it's" is a contraction for "it is."
:-)
If you are going to be a grammar-nazi at least be good at it.
As Nietsche famously said, "If you stare too long into the Abyss, 1d4 Tanar'ri of random type will attack you."
The article does do a good job at pointing out possible improvements. For example the article mentions how biased the search can get towards particular trends on the web.
To workaround this, the folks who have worked in the field of Information Retrieval offer query refinement. An example of this can be seen at work with Teoma. Teoma offers to automatically refine your search query into narrower concepts it thinks are relevant to the original search. Type in "Jaguar" and it will return results as well as a box that suggests you could search for the car or the animal by modifying the query further.
Overall, I feel the article walks a thin line by associating Google with the flaws. On the other hand for the folks at Google, it has been 5 years now. Maybe the improvements are not coming along as I expected. In any case, their index rules. As far as the web is concerned, in my opinion they are the Oracle...
Santosh Dawara
Err... maybe you should try searching for: "what you want to find out" -buy -commercial -shop -"some other terms involving exchange of money for goods and services"
My parent is absolutely correct. A search engine is a tool. If you know how to use it, you'll find it if it's indexed. It's not meant to be a tool to "give you what you want", as that would require a psychic.
When I search for cheapest shopping, I include "+prices", so "-prices" is probable a good start. That's the way to handle search-engines: Try out different combinations, and be sure to check out the Advanced Options, before giving up..
http://www.debunkingskeptics.com/
...about these Googleholes:
"I am getting flamed to high heaven in Slate's Fray for a piece of mine they just posted talking about some of the built-in limitations of the Google PageRank system. The general critique seems to be that I don't understand how to refine a search, which I guess I should have made clear in the piece itself. (I do, for the record. I also think Google is absolutely brilliant.) But as you can see if you follow the link, it's not a piece about how to use Google more effectively; it's a piece about ways that Google's system implicitly pushes us in certain directions, which makes it less like an authoritative reference source, and more like an op-ed page. (Nothing wrong with that, just something we should keep in mind.) Normally I quote from the articles themselves in this blog, but today I think I'll quote from a followup comment that I posted in the Fray..."
http://stevenberlinjohnson.com
You too can participate in the roast by finding his e-mail address on Google.
sarchasm: The gulf between the author of sarcastic wit and the person who doesn't get it.
I have also noticed an increase in shopping related results on google lately. So I found
( q)
www.google.com/search?client=googlet&q='+escape
in my personal toolbar link and added:
+'+-shop+-deal+-value'
I've had much better results since then.
Just wanted to note that large amounts of CS articles (dead tree and otherwise) are indexed by google via Citeseer due to it's usage of citations.
A nice example (though not the best, probably) is this search for phonetic matching
Where i work we name our servers after types of apples (land used to be an orchard). It didn't take but a half-second of thought to realise i should type in "apple types" to get nice listing of different types of apples. What am i looking for? Types of apples. What should i search for? Types of Apples.
'apple fruit'
I skimmed the first 5 pages and there was no mention of Apple computer.
HotBot used to do exactly this, they were using the Cyc AI engine (from Cycorp) to do it. Read more about it here
Vivisimo is a meta search engine that does clustering.
When you search for apple, the first clusters are mostly computer-specific, too, but that simply corresponds to what is on the web (it is different both with Google and Vivisimo when you use apples instead).
But sometimes, the automatic clustering can speed up the search, you don't have to find out yourself which additional (positive or negative) criteria work best (you can, of course still add them if you want).
A few examples of the first few clusters with a few Vivisimo searches:
- apache: Project, Helicopter, Mod/Module, Apache Software, Resources, Native American, XML, Apache Tribe, New Mexico, Technology
- python: Monty Python, Language/Programming, Snake, Book, Ball Python, Active State, Resources/Tools, Python Scripts,
...
- palladium: Microsoft, Platinum, Photos, London Palladium, Element, PCPA/FAQ, Hotel Palladium Palace in Rome,
...
- blair: Tony Blair, Blair Which, Jayson Blair, Blair/Nebraska, Coupons, Clothing, Blair County,
...
I think such clusterings can be useful. Also interesting: Clustering of 2087 Microsoft patents (1996-present) (provided as a demonstration).in my personal toolbar link and added:
+'+-shop+-deal+-value'
I find it useful to include '-free' in my searches.
Almost as a rule, a page that has 'FREE' plastered all over it is either trying to sell you something, or scam you in one way or the other.