Traversing the "Googlearchy"
baloney farmer writes "How much do search engines influence the availability of information online? A new study gives some surprising results. Search engines help with popularity, but not as much as you'd think: 'Traffic increased far less than would be expected if search engines were enhancing popularity. It actually increased less than would be predicted if traffic were directly proportional to inbound links. In the end, it appears that each inbound link only increases traffic by a factor of 0.8. The results suggest that the reliance of web users on search engines is actually suppressing the impact of popularity.'"
I've got to say no to this. Yes, when you search for something, you get the most popular results. But not everyone uses the same search terms, and even if you only go for the first three pages of results, you've still got 20 - 30 different sources of information, each different but similar query returning a slightly different set.
Only 0.8? Roland will have to post an additional 25% more "stories" to get his blog rank up.
See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year
It means people are finding what they're looking for more directly, rather than having to gad around. This is a good thing.
So ten links would increase traffic by a factor of 0.8^10, or about 0.1. Doesn't anyone's maths education cause them to develop innumeracy filters any more? Could it possibly be that it's supposed to be "1.8"????
In the end, it appears that each inbound link only increases traffic by a factor of 0.8. The results suggest that the reliance of web users on search engines is actually suppressing the impact of popularity.'
I can agree with that. I've seen users type "yahoo.com" into the search bar in firefox... which goes to the google search results page, where they then click on the "Yahoo!" link. It's almost as if users are conditioned to use "search" as their first action, regardless of whether they can remember the domain or not.
The theory of relativity doesn't work right in Arkansas.
I would actually be all for this, the trouble is just that it would force consumers to come in sealed plastic bags...
A factor of 0.8 means that the traffic is decreased by each inbound link. Weird.
Maybe a site's popularity isn't defined by the number of inbound links because no matter how many links to your site you have, people still only want to look at things they are interested in. So by defining web popularity not by links, but as "Some internet item people want to find" that means that the more links to an individual site simply lets interested people find that site easier. It would only change the popularity if it's forced on you (like ads) and you become interested by a curious side thought... The more links to a site you have, the more likely interested people will find it.
Funnypics
The results suggest that the reliance of web users on search engines is actually suppressing the impact of popularity.
When I first read this summary, I thought, "WTF?". So I read the article. And re-read the summary. And re-read the article. And I think I finally "get" it.
Let's say you run a "popular" site like the BBC news. You get a hell of a lot of traffic, and people tend to go directly to your site rather than via a link. Alternately, you get a lot of links that only a small percent of people seeing them follow.
Now compare that with an unknown site (most personal or academic webpages, for example). They get very few visitors, but most of them come from search engines.
So what does this tell us?
Almost nothing we didn't already know - Search engines DO indeed negate the impact of popularity, because popularity has little to do with relevance, while search engines generally try to maximize relevance.
This I consider a "good" thing. When searching for info on ripping a DVD using the latest copy protection scheme, I don't care if the latest pop idol calls ripping "totally not cool". I want methods, programs, and real life examples that might only have gotten a few dozen hits ever.
Sites with more links have more visitors (as defined by Alexa ranking, a rough tool at best) - big surprise , NOT. Everyone knows that sites with more inbound links tend to rank higher on the search engines and therefore get more visitors.
TFA then tries to make a big thing out of their 'discovery' that links are not the _only_ factor in the popularity (however defined) of a website. Again, completely obvious.
Then we hear that the correlation (not defined clearly) between links and 'traffic' (presumably actually some Alexa rank) is 0.8. Not clear what this actually means, but its hardly surprising the relationship between links and traffic isn't 1:1. Many factors will be causing this. For example, site-wide links off large sites make a huge contribution to the number of links but will make a smaller contribution to the target site's search engine ranking than the same number of links each from an individual site.
In theory, there's no difference between theory and practice; in practice there is.
I'm guessing that link up there in the summary had WAY more effect on their servers...
09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
Bingo! When I worked in newspaper management, one metric in our readership studies was the amount of time each day the reader spent with the paper. Longer was considered better, as it indicated people were finding many things (articles, ads, crosswords, whatever) that warranted their time. The irony, however, was that one of the main points of the redesigns that were done every few years was to consolidate information (such as 'news at a glance' pages) and to make it easier for the reader to find what they were looking for. If done right, time spent with the paper would decrease -- which would show up in the next market study, and be considered a bad thing.
What do they mean by 'increse with a factor 0.8'?
If my startingpoint is 1 and I put a link on it, does it mean that I now have a 1.8 or a 0.8 or what?
Or do they mean 0.8%? So if I start with 100, I now have 100.8 per incoming link?
Are they cumalitive? e.g. is is the second link (if it is %) over the 100.8, or over 100?
Also it looks like captain obvious. Pages that have more links to them are more popular. Also that people who have intersts in certain pages will only go to those certain pages.
Now if only a searchengine company would realize that, they could use this data to get some advertisements on both their site and on the site they link to.
Oh, wait, they reversed-emgineerd Googles business plan.
Don't fight for your country, if your country does not fight for you.
Just to mention that a better (maybe the origin of search bias, at least the earliest one I know) is as follows. Junghoo Cho, Sourashis Roy "Impact of Web Search Engines on Page Popularity." In Proceedings of the World-Wide Web Conference (WWW), May 2004.
I guess there should have been a discuss on that here for it's not NEW and there are quite a few papers on that.
The Greek would definitely have a contracted eta for just "Googlarchy."
Personally, I "search" for purchasing info, business info, etc.
I am told about "popular" sites directly... they are um, popular.
This issue is a bit more complicated than you think.
What does this mean? Without any other reference, I would assume that each link takes 1 unit of traffic (ut) to (1 + 0.8)ut. If so, n links would take your traffic to 1.8^n ut, which is unbelievable. What's missing here?
To do list for Windows
I hear somebody laugh at Google: "haha those ranking noobes did not understand anything."
Yes, the core of Google's ranking algorithms is based on incoming links, but it is far from something as simple as just counting the number of links. The _quality_ of the links is way more important. In addition, there are many signals Google takes into account beyond just pure PageRank (if this wasn't true, almost anybody could build Google). Yet, TFA uses and interchanges "# of inbound links" and "search engine score" as if they meant the same thing.
If they really are using # of links as an approximation to search engine score, then they're flawed from the beginning. If they aren't, then somebody isn't very good at conveying information.
It reminds me of the quote (not sure the origin): People who like this kind of thing will find that this is the kind of thing that they like.
You think it's bad now, imagine when Google has an AI model of what you want to find such that it tailors the search results for you alone.
Some years back, in the early 90's, I think, when there was little or no web and when advertising was done in physmail, I started to receive lots of mail about object-oriented stuff and little about other kinds of programming. "Ah, we're winning," I concluded foolishly. Later, I realized I was just pigeon-holed in a special Hell where I would never again learn about what others were doing because someone thought they had learned what I "liked".
It amazes and saddens me that a whole industry grew up around "personalized interfaces" which does not include as part of its regular practice: "ask the user what he likes". Amazon's court of last resort is to allow me to "correct" it assumptions about me by deleting records of specific purchases that are confusing its belief that I like certain things.... all substituting for an interface that just says "do you like X?" and lets me say "yes/no". And there's even some research saying they know better than I do what I want. Bleah. Personal indeed.
I'll be interested to see if this result holds up. It seems just as grim as the "personal interfaces" result. But sad or not, it does seem believable...
Kent M Pitman
Philosopher, Technologist, Writer
Not to obnoxiously plug, but lylix.net, a Linux/Asterisk VPS host that I consult for, has gone from a single-man show with few customers to nearly overflowing with incoming business as a result of an aggressive "white hat" SEO campaign - mostly just putting up good content on the site in a format that search engines like (and probably also the thousands of links from slashdot from my sig/homepage).
These results surprised me very much - I've gotten over a thousand hits on lylix.net as a result of my postings in the last month and a half, but this is easily dwarfed by lylix's position as the 3rd hit for 'asterisk VPS', first for 'linux asterisk vps', and being 4th-5th page for just "VPS".
For those who can put up quality content and carve out a decent search rank, Google is a veritable gold mine. Yes, it's possible that looking at the internet through Google's lens gives a skewed perspective, but it's still the best way to find most things. Word-of-mouth is find for big sites, or niche sites known by your friends, but I can honestly say I do not find most things online that way.
1) The biasing effect is not hard to calculate _exactly_, for example it's done implicitly in this old paper, see p.6 the paragraph after eq.10. Of course, it's well known that Google hasn't used PageRank exclusively for years.
2) PageRank's formula is well known, and doesn't just count the number of inlinks, but uses a "boredom" probability of about 0.2 (as explained in Page and Brin's original papers at Stanford, I think they used 0.15). To be precise, PR is the weighted average of 0.2 times a uniformly random measure and 0.8 times a matrix based on the number of inlinks. See a pattern? It's not surprising that the inlinks should only account for about 0.8.
3) Judging from a couple of older papers available online by the researchers, they've spent some effort to work out an approximation to PageRank using inlinks. The idea being that inlinks is easier to estimate than PR or whatever modified PR Google uses these days. Now they're looking at the inlinks empirically, and they're finding a factor of 0.8 associated with Google. Well, duh! That would be circular reasoning.
4) If the data they're using is recent and sufficiently significant, it might suggest that Google's secret PR algorithm is only a second order modification of the original PR, ie that even though the real PR is secret, it can be well approximated by the original Stanford PR. That in turn is both exciting and troubling.
This is pretty obvious.
If links were the only way to find new web content then the number (and popularity of linking sites) would totally determine a websites popularity (modulo a bit of advertising).
Now if you believe that at least occasionally people find sites through search engines that weren't linked to from any of the sites they normally visit the search engine reduces the impact of popularity. All you need is one example of someone searching for "f22 raptor cost overruns" who doesn't browse milatary/political websites and the search engines have reduced the impact of popularity.
I always thought the criticism of google was that their choice in search algorithm did less to reduce the influence of popularity than it could. I don't find this a compelling criticism, wisdom of the masses and everything, but it is at least a cogent point.
If you liked this thought maybe you would find my blog nice too:
I'm astounded that they think the correlation should be 1:1. Using some arbitrary figures:
If you have a large web page with 4 million inwards links, and you put the link in a million more places, you're 25% more visible - but part of the 25% that can now see the link in the new places will have known about the site before, and those people then don't add to the figure even though they've been targetted by the new advertising.
If you have a small specialised web page with only 40 incoming links, you're only being found by people who have criteria that fit your particular company; assuming here that it's not just from being a web fledgeling, you've only got a small userbase inside the specialisation who will come to your web page, and chances are they'll probably know about it. If you add 10 more links, then sure you'll get more people - but the people who are your target audience are likely to know about the site anyway, whether via magazine/word of mouth/forum discussion.
Unless your company is special, and is in the startup phase of getting to the relevant people - where the target audience hasn't found out about the site yet, and adding 25% to the links, by being in the right place, reaches that audience. You might get a return of greater than 1 if you do it in the right way there; where you were previously known by only a fraction of the target audience and can via google adwords or whatever suddenly reach a far far far more reaching audience, you'll get good improvement on your visitor numbers.
A major assumption in the whole thing is that each company assessed considers the entire markettable world as a potential customer base. By targetting 25% more people you'll get 25% more interest? Even if we assume that the extra people don't know about the site already, that'll only work if your product is interesting to 100% more people, which in the world of the web seems fairly unlikely.
Browsing with +2 to insightful posts and a higher threshold makes the average post seen seem a lot more ingenious
... the article for you:
The desirability of a website is not given by how search engines rank it but by it's actual content.
Well ... yeah!
Tie two birds together: although they have four wings, they cannot fly. (The blind man)
I had a 'vision' of an article discussing google on slashdot and hah! behooold! Maybe all these digits are getting to me after all. Eeery ... Anyway, The equation that came to mind was a bit like follows.
A scientist who makes an unusual discovery is alsmost certainly to get critics all over him. Yet, in time his discovery will be recognised as the result of an intellectual effort, an achievement. This scientist will become known as 'a smart person'.
Discarding the percentile of scientists who succeed at setting such a milestone and looking at people with scientific capacities (for the sake of argument, 5% of the googlers) one can only argue the search results in google will only become more irrelevant to the intellectual part of our society. So the results of google will become increasingly insignificant to the more educated part of the population, maybe even plain scholars.
This is of course not true for most specialisms and so on but even now sometime results are quite insignificant.
The signs are allready here.
free dom(inion) - free energy - free your mind - whee!
There are other possible interpretations for the sublinear scaling observed in the data. For instance, the quality of search engines might decrease the motivation for linking to already popular sites, whereas people may feel more motivated to link pages that do not appear among the top hits returned by search engines. Our search model, however, presents a very compelling explanation of the data because it predicts the traffic trend so accurately using a minimal account of query content and making strong simplifying assumptions, such as the use of PageRank as the sole ranking factor.
Further the search engines themselves allot page rank by the number of inbound links and the keywords found in the "a" tag of originating pages. So more inbound links will raise your page rank, get you ahead in the search listings and get you more traffic. But the traffic will be counted as a "search engine" generated traffic not as traffic originating from a referring site. With this much of interdependance between page rank and the number of inbound links how did the study control for it?
The number of inbound links is already reflected in the search engine generated traffic, or to use Wall Street parlance, it is fully discounted. There is nothing to see here. Move On.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
When I'm doing engineering research I often have to repostulate what I am looking for to Google several times before getting the results I need.
... that's great and all but I didn't have my handy conversion table in front of me (and didn't want to spawn a second search... which is part of the point, here)
For example last night I was working on a calculation involving "k", the heat capacity of air, in english units.
"heat capacity of air" would give me the answer in metric
"heat capacity of air english units" mostly returned specific heat results, not heat capacity results.
Finally after 5 iterations I resorted to finding the conversion factor...
cmd-k for searching, cmd-l for typing an address. It's not that hard, really.
Substitute ctrl for cmd if you're a Linux or Windows user. All this assuming you're using Firefox.
www.clarke.ca
First I'll admit I'm a little confused by the article. Are they measuring a page's popularity in a search engine by its number of inbound links? So they're saying that as the number of inbound links increases (i.e., in their opinion, the site's ranking in the search engine), the number of page visits increases? Maybe I'm missing something, but if that's the case this research raises an eyebrow here at least. If they have page ranking data from Yahoo, why not use that instead of inbound links? Or maybe by "page ranking" they mean "number of inbound links".
I guess people need to study something, and sometimes one will come up with surprising results. But this study reminds me of the "Long Tail" discussion that was all the rage for a couple weeks. "Wow, with the internet we can find niche information!" Who didn't know that? So now some information pundit (don't remember his name) gets to make a bunch of money for putting a name on a self-evident truth so that business types can sit around and discuss it like it's something revolutionary.
If people didn't use the internet to find things, then why would Google be worth billions of dollars? If people didn't use the internet to find things, then why would companies be paying Google huge sums of money for page rankings? Those who track ROI will usually tell you that it makes them money (and if not they'd stop doing it, if they were tracking ROI).
I don't follow professional sports, but a lot of people do. So a lot of people are going to search for "Toronto Blue Jays". Good for them. There are ~6 billion people in the world, each with some common interests, and each with some less common interests. If you're making a web site to sell iPods then likely you'll be lost in the crowd and have a hard time gaining traction. If you're selling refurbished vintage Massey Ferguson tractors with patent leather seats and Corvette LS1 engines, then likely you'll end up #1 on Google pretty quickly. Do you want 0.0000000001% of a billion dollar market or 100% of a $0 market? Your choice.
www.clarke.ca
This seems to be another one in a string of papers by Fortunato et.al.; the previous ones were The egalitarian effect of search engines (from arXiv, which never seems to have been published properly), and Googlarchy or Googlocracy (from IEEE spectrum.)
It was even featured on slashdot before: Search Engine Results Relatively Fair, posted by Zonk on Sat Nov 19, '05 04:29 AM.
But they seem to have improved their reasoning this time: They finally cite Donato et.al.'s work (Large scale properties of the Webgraph), which explicitly contradicts their claim that there is a correlation between in-degree and pagerank.
Regards, Sebastian.