Google's Search Results Degraded?
scrm writes "According to this Wired article, recent tweaks to Google's PageRank search algorithm have degraded rather than improved the accuracy of the results." I noticed this firsthand the other day, but only when I was searching for pictures of famous people, but all my technical queries came back fine.
The article suggests that many people are saying pagerank is working badly because they have lost their previous power to affect search results. Overall, the pagerank seems to have improved in this latest incarnation (IMHO)
...available in Mark Pilgrim's blog
google still seems the best. Sometimes I use teoma or lycos because they give different top results. Being tied to one search engine seems bad as you miss alot.
I had an instructor point us to a page on networking that was amazing good but not found on any of those 3 search engines, at least in the top 30. Most of those top 30 hits wern't very good either.
Maybe yahoo has it right, The web should be indexed by people.
We who are editors at dmoz hold a lot of power right now. Its time for you to share in some of that power. Head over to dmoz and apply to edit your favorite category.
Can't decide where to apply?
Whether the rest of the article and of Google's changes are simply causing a rash of sour-grape whining or not, one thing I did notice when I used it yesterday: for a current topic of major interest at least to its part of the world, I got a helluva lot of dead links and blank pages ("Document contains no data" and when I checked, sure enough, it was just the HTML and /HTML tags, with no content). This did strike me as unusual not to mention annoying. More to the strange, none of these "dead pages" were in Google's own cache.
:( and no, it's not pr0n :)
(I still haven't found what I was looking for
~REZ~ #43301. Who'd fake being me anyway?
To test if we really are in front of a Dmoz dominated Update, we have set up a Aspseek based small search engine, a GNU search engine with a crude PageRank-like ranking system. We have indexed around 1.500.000 pages, using as the starting point 700 Dmoz very competitive dmoz cats, including up to 250 pages per site, following up to 10 outside link, with up to 100 pages per outside link. What we found was that 59% of our top 20 results on the 100 cat-related competitive Keywords where also top 20 in Googles new index, and 26% of our Top 10 where also Googles Top 10.
But we must also said that we have not been able to find a so-compelling relationship using no-competitive categories. A 2.000.000 pages index with non-competitive regional cats, using non-competitive Keywords, showed a very small correlation between top 10, top 20, and even top 50 results.
So, our working theory right now is, yes, small changes, probably committed in order to fight both googlebombing and Pagerank commoditization, have affected the index accuracy in many different ways.
We think the index is unbalance, or unless much more unbalanced than the last one, and, as a result, the weight of some previously no-so important characteristics are souped-up, opening the door for abuse.
The main effect of all of this is small well managed sites at competitive categories can not relay anymore in good content + good linking to get a good listing. They will have to pay Google using Addwords program. That is the main change here. Forcing small business to pay to get at the top, using advertising space.
But we do think this update and the changes committed are, to say the least, unbalanced, and the new algo is rampantly open to easy abuse. Lets hope Google good old Phd common sense returns soon, and a new, improved update takes place as soon as possible. Lets hope they are not tring to make a easy killing by forcin small popular sites, all of the sudden deprived from traffic, to pay google using Adwords.
According to this Wired article, recent tweaks to Google's PageRank search algorithm have degraded rather than improved the accuracy of the results
Actualy, according to the artical, a few people who run blogs seem to think that google has been degraded, while google itself has not seen a higher number of actual complaints.
Basicaly what happened is that google took some mesures to reduce the effects of "googlebombing" by bloggers.
autopr0n is like, down and stuff.
* you can't beat the best google-spammers in the end ... they're always smarter, quicker, difficult to identify
... let's hope this isn't a reason
* worse rankings with a particular keyword mean that a company will seriously consider using AdWords to maximize the traffic gain from google - so hey, it's good for google
* the big mistake is to use a "static" relationship between websites as a measure for a site's traffic or importance - better offer a "google counter" (google has the resources, I suppose)
All things considered, Google is still doing pretty well.
-lj
"I love my job, but I hate talking to people like you" (Freddie Mercury)
Google used to be pretty damn good. Have a hard time finding stuff these days though. I think one part of it is time. Google made its debut during an explosive time of growth for the 'net. That means the majority of the searchable items were roughly the same age, and so timewise context wasn't really an issue. But now that we've all got a few more years around our waists, some of those documents aren't going to be much use to us folks who are living here in the next millenium. An aged document should get a heavy negative score unless the user is explicitly looking for old stuff.
Was having problems with a program last night and kept finding links from 1996. I kept saying to myself, "My good man, surely those bugs have been fixed since then!"
Looking for things on the net with any search engine is a distinctly unpleasant experience. Kind of an impotent feeling to type in an explicit search with all the trappings ("bla bla" -foo -gar..etc), and to get a bunch of nothing in the results.
And don't ask me why, but I've been getting an uneasy feeling about google. Where do they get all the money such a company needs to stay afloat? They're in a pretty good position to learn all kinds of neat details about everybody. Pretty powerful position to hold.
...because I've noticed some odd correlations.
For a long time, a search on "Samuel Johnson" returned Frank Lynch's "Samuel Johnson Sound Bite Page" as the first hit. And, flatteringly, but mysteriously, a search on "Eyeglass Prescription" returned a web page of mine as the first hit. (I say "mysteriously" because the only page that Google reports as linking to my page is... my own home page! So it is not PageRank that accounted for its ranking).
About a month ago, Frank's page dropped to #3 and mine dropped to about #20. In Frank's case, the #1 spot went to a fine Samuel Johnson web site at Rutgers; in mine, I was edged out by a bunch of commercial sites selling eyeglasses.
The interesting thing is that two or weeks ago both sites popped back to number 1.
And then a few weeks later, Frank's is again at #3 and mine is down around #10 or so.
I don't think there's any reason why eyeglass prescriptions and Samuel Johnson would be connected. (And, no, Frank's page and mine do NOT link to each other!) So the changes must reflect tinkering by Google.
Neither Frank nor I use any kind of "cheating" to boost our ratings. And I don't think the sites that climbed above our did, either. Nor do I think many of the sites involved changed ANYTHING significant that would have altered their rankings.
(BTW I'm NOT giving URL's because the contents of these pages are irrelevant to my observations, I don't want them slashdotted, and this is NOT an attempt to boost the rankings of either page).
"How to Do Nothing," kids activities, back in print!
He still beats Tracey Ullman, though.
---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
The service search engines provide to all of us, in the face of what can only be charitably called a glut of data (some of it actually useful to some of us) is remarkable. (And let's be clear: We're talking about Google, possibly the only thing keeping the net from collapsing under it's own weight in ether.)
It's great that Google is effectively "free" to us, but should that be so? I'm personally willing to pay a nominal charge -- perhaps in the form of a subscription -- for access to the most effective search algorithms. It has that much value to me. If someone wants to create a free system -- as Google has done -- that's great. But as the web gets bigger and hairer, we might need to think about ways to support (read, fund) continued development of effective search/navigation tools.
There's some serious linear algebra in publicly disclosed portions of the PageRank(TM) algorithm, and the really good stuff is probably even funkier. This sort of thing isn't going to be hacked together on the weekend by your average penguinista. Could the brains behind Google go diving for herring in richer seas? I suspect so.
Background: Among other things, I am always trying to discover music from independent (especially blues) artists that post mp3's of their stuff on the web. I have been boycotting the major record labels 100% for about 15 years (hooray for independents!) for several reasons: 1: CD prices have always been a rip-off, 2: most major label artists suck, 3: I have worked in music business (artist/bands/production) and detest the industry for the way that it exploits artists, 4: I have always loved discovering talented "unknowns" and turning other people on to them. I went through a new music dry spell until the web started to become a vehicle for independent artists to promote themselves.
It's amazing what's out there now - I've found great artists from all over the Americas, Europe, Australia, Russia and even a few from Asia. I have found a lot of crap, too. The mp3 search engines are essentially useless for this purpose (I don't want major label music) and I have never used Napster or any of the off-spring. Links pages are more often out of date than not and webrings have similar problems. I have contrived several search techniques that try to exploit the strengths of search engines and the likely information on an artist's site. One very simple one is to look for "mp3 +(insert name of a well-known blues standard) -(a lot of keywords to exclude the many sites that put "mp3" on every page that simply lists a song title just to pull in traffic) -(specific sites that pollute the searches)", to find artists that cover the song and also have their own tunes.
I have been a proponent of Google for many years. It came along just as I really started to dislike Altavista and I was an almost instant convert. But I am always on the lookout for a backup or something better. I have tried Teoma several times in the last year (as recently as last night), but I'm not terribly impressed. I find its interface and the way it presents results simplistic and dumbed down and it appears to have indexed far less of the web than Google. I got turned off Lycos years ago, when it seemed to want to become another portal/Yahoo (as if we need another one).
The one search engine that I do use as a regular alternative to Google is Alltheweb. For one thing, IMO, its advanced search is currently better than Google's (I swear that I have brought Google to its knees by entering too many keywords - it stops responding and is inaccessible for several minutes thereafter - this has happened several times). When I've done back to back comparisons with Google, Alltheweb seems to fare pretty well and seems to find more international pages than Google. The difference in top rankings can also be useful. Google has some nice features that Alltheweb does not, such as the elimination of duplicate pages.
For one-stop searching, I find Google best for me, but Alltheweb is a good alternative.
Sigs are bad for your health.
I know Pagerank algorithm is patented.(6285999) Will I get into any trouble if I write any open source software using a similar algorithm? Thanks.
I e-mailed the problem to Google about a week ago, but so far they didn't seem to get around it. Anyway, a Google search on my last name reveals my personal homepage as the result number one, which is no surprise, considering the last name. However, the cached version of what supposedly is my site is an entirely different site that I have never heard of. Furthermore, since the results of Google search use the title and description from the cached version, the title for my homepage as well as description come up pointing to RhytmicPalmz.com or something of that nature. It seems to be a cache glitch, at least so far I haven't been able to come up with valid explanation for that.
First: you'd get caught instantaneously. The ranking engine that scores you for display == the spammer detection engine.
Second: for persistent offenders, don't just delete the spamming site, delete all the sites they own or operate. Or threaten the ISP with black-holing.