Slashdot Mirror


The Real Problem With Alexa

Alexa drives me nuts. It uses a broken methodology to measure the internet and is, for reasons unclear to anyone, regarded as somehow definitive simply because it allows you to compare two sites with a single simple number. Its sampling methodology is flawed and the numbers it produces are meaningless. And if you want to help me prove this, please install their toolbar. Of course since most of you are Slashdot readers, most of you won't and that only helps prove my point. Read on for what I mean by all of this, and why it matters.

As the defacto 'Guy in Charge' of a reasonably large web site, I am routinely asked questions by a variety of people that lead inevitably to Alexa. It might be a question from my Boss at SourceForge about traffic. Or it might be a sales guy asked by a possible advertiser why some other random website is bigger or smaller than Slashdot. Most often it's a random reporter doing background for a story that has nothing to do with Slashdot. Why I'm considered an expert is very confusing, but why they always regard Alexa rankings as meaningful is even more so.

Here's the problem: Alexa doesn't work because of who will install it, and perhaps more importantly, who won't. Let's start with a place I'm very familiar with: Slashdot readers. Until recently Alexa didn't work on Firefox... instead only IE users participated. On the internet as a whole that's fine: like 80% of users run IE. But on Slashdot only like a quarter of you do.

What about re-installing the plug-in after you update your browser? When Firefox 2.0 came out, almost a third of Slashdot readers upgraded within a few days. You upgrade Minor Firefox releases overnight. Even IE users of Slashdot update relatively fast, from 6 to 7 or even minor revisions. New versions often break old plug-ins. When you get that alert that a plug-in is out of date do you just forget about it? I know I do. And that's not even counting clean OS installs. But if I went to random non-technical friends and family installations, I frequently see versions of software so dated it makes me cringe.

And that's not even talking about the fact that Alexa's toolbar is pretty much spyware. How many Slashdot readers are giddy to install spyware? You either? Big surprise. Because of who we are, and what it is, our population will self select out of consideration.

Did you know Alexa excludes SSL? How many etrade users do you think there are? Now personally I'm glad that they aren't tracking my browsing at my credit card company, but it's just another factor reducing accuracy.

Equally perplexing is the accounting of iframes. Let's look at someone like double click's alexa rating. Now it's hard to say, but I don't think I've ever visited their website. Have you? But according to Alexa, they have nearly a 1% share of the internet. I'd tend not to believe it... but they have iframes on zillions of web pages and counting those sure would account for this huge ranking. What about all those badges for the popular social networking websites? What influence are those iframes having on Alexa rankings? Alexa's FAQ says they don't count, but I'm skeptical.

In Fact, Alexa KNOWS that it is a flawed metric for measuring. Have you ever tried actually looking up alexa on alexa? Unsurprisingly, it is unavailable. Why? Visitors to Alexa.com would be the most likely of any user population on-line to have installed their plug-in. I don't know what their 'Rank' would be, but I bet it clearly would be an apples to oranges comparison against ANY other site on-line.

Of course who do you think actually will go out of their way to install something like this? I have a good guess... if you are obsessed with acronyms like SEO or terms like PageRank you are very likely to care very much about these things. I spend a real percentage of my week dealing with people flooding my systems with garbage content designed to screw with these ratings. And you know they all have the toolbar installed so their zillions of worthless spam websites are being counted.

This problem has parallels elsewhere of course: The Nielsen ratings struggle to account for PVRs. Since you got a TiVo, when was the last time you watched "Live" TV? This is part of why Science Fiction shows struggle on TV... scifi fans are early adopters. So we stopped getting counted and our favorite genres are butchered by networks and lost to the void. PVR users tend to be wealthy (those boxes are expensive) and educated. Now I'm not saying that the dumbing down of TV is exclusively the fault of Tivo, but it sure didn't help that we weren't being counted as excellent "Smart" TV shows get canceled while we keep getting more seasons of Survivor. Who we are and how we live causes us to not be counted, and this has unintended consequences.

So what do we do? I wish I had a good answer to this. My first suggestion would be that if anyone mentions Alexa to you that you freak out and go on a 5-minute rant about how Alexa is stupid and anyone who is using it to seriously make a business decision should be fired. It doesn't actually help, but i estimate that every time I do this, I burn the same number of calories as I might on an elliptical trainer. I assure you the beer gut ain't getting smaller on its own.

Alternatively you could just install the toolbar on every machine you can find and skew the numbers ridiculously towards people that are likely unrepresented. Of course, the conspiracy theorists amongst you will just bitch that I'm trying to fudge Slashdot's own rankings in a system I'm claiming to hate. But that only helps proves my point... the conspiracy theorist is a demographic strongly represented on Slashdot that is unlikely to trust this software. We all ignore a broken status quo "Gold" standard that would fail a 100 level college science class on the grounds of flawed methodology. And this only leads to us not being counted.

76 of 372 comments (clear)

  1. Spyware? by xXenXx · · Score: 2, Insightful

    Isn't Alexa considered spyware?

    It baffles me how people actually look to them for information, considering how they get it.

    1. Re:Spyware? by TheSHAD0W · · Score: 2, Informative

      From the above article.

      And that's not even talking about the fact that Alexa's toolbar is pretty much spyware. How many Slashdot readers are giddy to install spyware? You either? Big surprise.

      I'd mod your article redundant, but I believe the point does need emphasizing.

    2. Re:Spyware? by Miseph · · Score: 2, Informative

      Parent's link is to a photo of an enlarged and strangely external male anus, and while not the classic goatse, is certainly a related image.

      Just in case anyone was wondering what the -1 Troll actually meant.

      --
      Try not to take me more seriously than I take myself.
    3. Re:Spyware? by Darby · · Score: 2, Funny


      Parent's link is to a photo of an enlarged and strangely external male anus, and while not the classic goatse, is certainly a related image.


      You type quite well for a person who just gouged their own eyes out with a fork.

  2. Do it to ourselves, and that's what really hurts by Control+Group · · Score: 5, Insightful

    That's all true, but unless someone's got a better alternative, it doesn't matter.

    It isn't surprising that people who spend money on advertising want to have some metric by which to predict (estimate, guess, what-have-you) the impact of each dollar spent on web advertising. Assuming the people spending the money are, as a class, either stupid or ignorant is a mistake. Odds are good that many of them know that Alexa is flawed, but also consider any information better than nothing. If nothing else, Alexa rankings demonstrate the relative popularity of a web site among Alexa participants - which is at least a concrete demographic, and the stats are inarguable on that basis.

    What's being missed is that there's a fundamental problem, here. Populations which refuse to share information with such aggregators will always self-select against representation. It's no different, really, than stating that populations who do not vote self-select against being represented in government. That doesn't stop us from using elections as a way to select people into government.

    In the specific case of slashdot selecting against itself, it's debatable whether we're a demographic many organizations would even want to target (with web advertising) if they could. How many comments on how many stories have included someone claiming that he's either unaffected by or negatively affected by advertising? That he's less likely to buy a product he sees advertised? Broader yet, how do you suppose the median number of lifetime banner ad clicks for the slashdot user compares to that of the web-using population at large?

    I posit that we pose a particularly galling challenge to marketers. On the one hand (if you'll allow me a bit of net-cultural hubris), we're a demographic of above-average intelligence, above-average income, with an above-average tendency to spend money on brand new technology, and who have an above-average impact on what other people will buy. On the other, we refuse to share our habits with "big brother," we're easily offended (eg, we hate proprietary formats solely because they're proprietary), comparatively hard to bamboozle, and have a cultural predisposition towards "free" (both beer and speech). That is, on the one hand, we're a fantastic demographic to succeed with, but on the other, we're a tough nut to crack.

    The point is that Alexa is flawed, without a doubt. But it seems more flawed from the point of view of a group which deliberately makes itself all but impossible to measure. And frankly, if we're not willing to provide the information necessary for advertisers to make informed choices, we're going to continue to be ignored, both on the web and on television. (Yes, I do realize that Nielsen is specifically flawed with respect to DVRs - but even if they weren't, how many members of this site would voluntarily install habit-tracking software on their TiVo? How many members of this site would call for a boycott of TiVo if it installed it for them?)

    --

    Reality has a conservative bias: it conserves mass, energy, momentum...
  3. Re:Rant as news by tabacco · · Score: 5, Insightful

    That's probably why it's filed under 'Editorial'.

  4. I must be stupid... by nonos · · Score: 2

    ... but what is Alexa ?

    1. Re:I must be stupid... by mmxsaro · · Score: 4, Informative

      Alexa is a ranking system to measure how popular a certain website is on the Internet. A user, however, must have the Alexa toolbar installed for Alexa to measure site rankings accordingly. As of right now, Slashdot is ranked 558 out of 1 million+ sites that Alexa tracks.

      Note: you don't need to install the toolbar to figure out Alexa rankings. Check out the Search Status extension for Firefox. I have mine sitting at the bottom right corner of the browser to display me PageRank and Alexa rankings.

    2. Re:I must be stupid... by mmxsaro · · Score: 2, Informative

      I've heard of Alexa when it first came out in the mid 90's (think early 1997). I recall many people adopting it for its search engine capabilities and ranking features ("What's popular on the web today?"). Think of this as the pre-Google era, when searching through millions of pages was a daunting task and you didn't know where to surf when you first connected to the Internet via dial-up. You'd install the toolbar through word-of-mouth (or you saw some flash banner ad...) thinking that it will help you find what you're looking for on the web. I guess the toolbar just stuck around and webmasters/SEO 'experts' picked it up a while back thinking it's great data to judge website traffic. True SEO experts will never worry about Alexa data, as it's very easy to manipulate it. To some extent, you COULD say that Alexa's rankings are semi-accurate (although not precise in any way). If you have 1-2 million toolbars active and you want to see what's hot out there, Alexa isn't a bad place to start, but just like Slashdot's own poll system; "This whole thing is wildly inaccurate. Rounding errors, ballot stuffers, dynamic IPs, firewalls. If you're using these numbers to do anything important, you're insane."

  5. Re:Rant as news by suv4x4 · · Score: 5, Funny

    Will have to reread this, but it doesnt come off as news but a rant. And no I wont install the toolbar.

    "Rant" ?

    CmdrTaco is being rebel, anti-establishment, rage against the machine, fuck the system! This is what he's done here, and he deserves *respect* old man.

    Back in the days, when we were pissed about religion, wars and social injustice, we dressed like goths and sang bad rock and roll and emo music.

    But today, thanks to the world wide web, we take the next level, and all this unrelenting energy in today's youth comes in the form of a rant against a toolbar that rates sites. And I say, bravo.

  6. Re:Rant as news by networkBoy · · Score: 4, Informative

    Of course it's a rant, it's an editorial.
    The tags were there before TFA.
    Furthermore you will need to re-read it because of your race to FP you likely only read the front page blurb. /rant.
    -nB

    --
    whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
  7. *I* figured out why Taco's on a rant! by everphilski · · Score: 5, Funny

    ...because digg.com is beating slashdot.org :)

    1. Re:*I* figured out why Taco's on a rant! by ColdWetDog · · Score: 2, Insightful

      Well, digg has been beating slashdot for a year now, and is nearly a magnitude higher in rank.

      No, i guess the most recent event is 4chan passing slashdot...

      So idiots in general and pornography obsessed idiots in particular are more common than whatever-it-is-that-lurks-about Slashdot?

      Somehow, I feel better already.

      --
      Faster! Faster! Faster would be better!
    2. Re:*I* figured out why Taco's on a rant! by metlin · · Score: 4, Interesting

      Oh sure, and YouTube is beating Digg, but that doesn't mean that we'll all move over to YouTube.

      No, like another poster said, it is quality over quantity.

      If you think some of the arguments on Slashdot are asinine, wait until you read the ridiculous ones on Digg. And give everyone the power to moderate and you have people burying others' comments because they disagree with them.

      Add bad grammar, spellings and l33t speak and you have a ridiculous combination of utter rubbish that only a bunch of emo sixteen year-olds can spew forth. Give me Slashdot any day.

      At least some you trolls have character. ;-)

  8. Alexa's Spiders by Garridan · · Score: 3, Interesting

    When I used to administer a website (b2b, you've never heard of it) my boss loved Alexa. I told him time and again to uninstall it, and even did so myself a number of times... but he'd put it back every time. Then, one day, all dynamic content on the main page just vanished. I brought it back from backup, and chocked it up to a bug. Then, it happened again a little while later. I started snooping around our logs.

    Turns out, Alexa's spiders were ignoring the robots.txt file, and capturing usernames and passwords. It logged into the administrative area, and followed the "delete" link for every entry. My dumbass boss still didn't want to uninstall Alexa. Could have strangled the man.

    1. Re:Alexa's Spiders by captnitro · · Score: 5, Informative

      I'm just ragging on you unnecessarily here -- but was Alexa following POSTed form actions or something? This is why there's a completely different verb for the alteration or deletion of a URI object (POST) vs reading one (GET). (And shame on somebody for sticking usernames and passwords in GET variables, if that was the case.) /nitpick

    2. Re:Alexa's Spiders by _xeno_ · · Score: 2, Interesting

      The HTTP spec clearly says that GET requests should only be used for idempotent actions. Technically, deleting an entry is an idempotent action, so using a GET link for a delete entry is - well, brain-dead stupid. But it doesn't break the spec.

      See, an idempotent action is simply an action which has the same outcome the second time you attempt it. Deleting an entry twice doesn't change the final state of the system - the entry is still deleted. That makes it idempotent.

      Of course, anyone with an ounce of sense would realize that what they really meant was that GET requests shouldn't change state and that POST requests should be used to change a system's state. (Or PUT, or DELETE. But no one ever uses those.) Which was the point of the parent poster in any case.

      But before someone pulls out the "GET is supposed to be idempotent" part of the HTTP spec, remember that deletes are, technically, idempotent. They're safe to attempt multiple times, and leave the system in the same state afterwards.

      --
      You are in a maze of twisty little relative jumps, all alike.
    3. Re:Alexa's Spiders by someone300 · · Score: 2, Insightful
      From Wikipedia:

      a single call or multiple calls produce the same result and the same side effects to the entire system as a whole.
      It says here, not just the same effect on the system as a whole, but also the same result.

      If you call delete twice on the same record, the second time will have a failure result, rather than a success result like the first. Doesn't this make deletion non-idempotent?
    4. Re:Alexa's Spiders by Jah-Wren+Ryel · · Score: 2, Informative

      If you call delete twice on the same record, the second time will have a failure result, rather than a success result like the first. You are msinterpreting "result" as "return value" -- idempotent is a mathematical term, and in math there are no "return values" just end results. Deleting always produces the same end result.

      Here's what the W3C says about it, although they also talk about a specific DELETE request on the same level in the http protocol as GET - http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.h tml
      --
      When information is power, privacy is freedom.
  9. Re:Rant as news by Anonymous Coward · · Score: 2, Funny

    Don't worry, you'll be able to re-read it tomorrow when Zonk makes a dupe post.

  10. Been complaining for years by truthsearch · · Score: 4, Interesting

    My first suggestion would be that if anyone mentions Alexa to you that you freak out and go on a 5-minute rant about how Alexa is stupid and anyone who is using it to seriously make a business decision should be fired.

    I've been doing this for years. The problem (or actually just what marketers perceive as the problem) is that there is no generic public way to compare web site traffic. The only true way to get traffic metrics is from the web site owners. And they could easily make it up to take in more advertisers. So people in advertising look to Alexa as the only third party source.

    The biggest sites don't have as much of a problem because they can work closely with advertising partners. Medium and small sites, however, don't get as much personal attention. So proving themselves as worthy web space for ads is more difficult.

    The only people I've heard of that install the Alexa toolbar are web site owners because they want to see their rank often. Ironically so few people have the toolbar installed that they drastically boost their own rank.

    We need to convince marketers that Alexa is pointless. But I'm afraid that without a good replace they'll keep using it.

  11. Re:So... by eln · · Score: 2, Insightful

    He says install it, and then in the very next sentence says that he know you won't, because you're a Slashdot reader. The entire rest of the post is about why Alexa is flawed and shouldn't be used for anything by anyone.

    Sure, from a purely mercenary point of view he'd like you to install it so that advertisers stupid enough to use Alexa will see Slashdot's traffic represented, but he acknowledges that you almost certainly won't. Taco has never had the kind of relationship with his readership to where he could tell them to do something and they would go out and do it, unless that "something" was "post goatse links to Slashdot," and I'm pretty sure he knows that.

  12. Count me in! by customizedmischief · · Score: 5, Funny

    Come on folks, it's time to be counted!

    Now where can I download the Alexa plugin for lynx?

    --
    Oops.
  13. Re:Do it to ourselves, and that's what really hurt by poetmatt · · Score: 3, Interesting

    I hate to say it, but that really proves not as much that "The only way advertisers can get accurate data as people opt in", as it proves that they have not elected to find new methods to track data properly/independantly. If you were able to develop a way to get honest and accurate data of the number of hits on a site to site basis, would even that be more accurate? (assuming you started to collect an enormous list of sites). Say check all the news aggregator websites language by language (I'm sure there's thousands in each), but rank them by who is getting the most unique hits in a day, etc? Of course a site could skew their own results which creates its own problem but would this not at least be more valuable than alexa data?

  14. Proprietary Software by saibot834 · · Score: 4, Insightful

    So Alexa says they are not spying on the user. Big surprise.

    How can I verify what this toolbar is really doing unless I have the source code? IMHO the problem lies there: There is no trust for Alexa because nobody can really say for sure how it works and that it doesn't harm the user.

    1. Re:Proprietary Software by Atlantis-Rising · · Score: 2, Insightful

      While you cannot verify what the software is actually DOING, you can monitor/verify what the software is saying.

      In many cases, not only is the latter more effective, from a cost/time/benefit perspective, it's also easier and provides far more useful information.

      --
      "It is possible to commit no errors and still lose. That is not a weakness. That is life." -Peak Performance
    2. Re:Proprietary Software by whitehatlurker · · Score: 2, Insightful

      Agreed, the best way to see what they're collecting is to watch the stream and actually see it. However, the real question isn't what they collect, but what do they do with it? It's fairly obvious that they send back all of your browsing habits.

      --
      .. paranoid crackpot leftover from the days of Amiga.
  15. And we care...why? by Itninja · · Score: 3, Insightful

    How is Alexa different than any other selective-survey system? The Nielsen ratings are acquired via 'diaries' (or occasionally set-top boxes). Radio 'listener share' is determined similarly by Arbitron. The NY-Times bestseller list is based on books sold to distributors, not books sold to the public (millions of unsold 'bestsellers' get pulped or donated to libraries every ear).

    Just come to terms with the fact these organizations are in bed with advertisers and move on with you life.

    --
    I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
  16. From the summary by UbelievablyLame · · Score: 5, Funny

    "Of course since most of you are Slashdot readers..."

    hm... given the context I would say 'most' is an understatement

    1. Re:From the summary by eln · · Score: 4, Funny

      He's hoping to get crossposted on digg.

  17. Re:Do it to ourselves, and that's what really hurt by trolltalk.com · · Score: 4, Interesting

    "It isn't surprising that people who spend money on advertising want to have some metric by which to predict (estimate, guess, what-have-you) the impact of each dollar spent on web advertising."

    There are several easy ways:

    1. as an advertiser, host the ad on your own server, and just look in your logs ..,
    2. as an advertiser, get access to the server's banner administration system for your ad account (postnuke allows this on a per-advertiser basis)
    3. as an advertiser, just be skeptical as all hell and don't believe 99% of the stuff you hear - its all BS anyway

    If you're so naive as to not insist on hard numbers for actual views (the log files are best , you deserve to get hosed - you can analyse the log files and factor out multiple views per host ip to get the actual number of real views, and reduce fraud; ditto with geolocation of ip addresses to factor out bots in 3rd world countries; ditto for bots that crawl every link on a page; ditto for pages that are loaded then immediately dumped for another page).

    As an advertiser, I'd want unique eyeballs - real human eyeballs - that can be verified.

  18. Business? by 19061969 · · Score: 3, Informative

    In my experience, a lot of PHBs are only too happy to have information. They don't really care if it's valid information or not, just so long as it's there and that it sounds good.

    It was a massive wake-up call to realise how many middle-managers and the like will quite happily swallow any old crap as long as they perceive that it's authoritative. Has anyone ever tried to tell them about how bad the information is? (real question btw - I'm interested in seeing if other readers experiences were as bleak as mine).

    --
    bang goes my karma... again...
  19. Alexa is useful by mbone · · Score: 2, Informative

    It clearly has biases, and (worse) these seem to change slowly with time, but for the web sites I host, there is a nice correlation between their Alexa reach and their
    hit count.

    It is certainly good for a crude ranking of sites - Slashdot's rank right now is 558, and that clearly means a lot more traffic than some site than a rank of 5 million.

    So, like many other measures on the Internet, it is flawed, but it has value.

  20. Re:Do it to ourselves, and that's what really hurt by zarkill · · Score: 4, Interesting

    And frankly, if we're not willing to provide the information necessary for advertisers to make informed choices, we're going to continue to be ignored, both on the web and on television. This is one reason that I actually like Amazon's recommendation system. I can provide information about what I like and don't like, and the site will then suggest items that I may be interested in based on that. If it suggests something that I'm not interested in, I can click "not interested" and it never presents that item to me again.

    I would LOVE to have a similar scenario for other ad-driven media. Imagine if I could flag TV commercials with "not interested" and then never see that commercial again, or any commercial for a similar product. Once it got a good feel for what I really like and don't like, I probably wouldn't feel the need to skip commercials. The same could be said of web ads. If I could cherry-pick which ads I was interested in and which I wasn't I might not be so inclined to block ALL of them.

    Ads are useful to me sometimes, but picking the signal out of the noise is usually such a hassle that I'd rather just skip the whole process. If everyone could make a very personal statement about what they want to see ads for and what they don't, I think the benefit for both parties would improve.
  21. Spyware yup. by crabpeople · · Score: 4, Interesting

    Symantec corporate flags the alexa toolbar as spyware, so I couldn't run it if I desired to.

    http://www.symantec.com/security_response/writeup. jsp?docid=2004-062410-3624-99

    --
    I'll just use my special getting high powers one more time...
    1. Re:Spyware yup. by mmxsaro · · Score: 2, Informative

      Not true. If you're using the Corporate version of Symantec Antivirus, you can allow Alexa on your computer by simply excluding it from your searches (see exclusions). There will be Alexa listed as "adware/spyware". Enable the ignore option on it and you should be fine.

  22. Stupid is a stupid does by Anonymous Coward · · Score: 5, Insightful

    Alexa targets a demographic which are more likey to click on banner ads and buy the junk which they advertise. So for the advertisers targeting those demographics I'm sure it works out ok.

  23. Asked and answered by Control+Group · · Score: 2

    Of course a site could skew their own results which creates its own problem but would this not at least be more valuable than alexa data?

    No, it wouldn't, and you've already stated why. Everyone knows that web site logs are the single most accurate way of measuring web site traffic. And no one uses them anyway - not because they think Alexa collects better data, but Alexa doesn't have a vested interest in making a given site look better than it is.

    A system which counts on the person selling to give you an honest evaluation of the worth of their product is never going to be more accepted than one involving a third party.

    You're right, however, in that what's really needed is a better way to track visits to web sites. The problem is that we can't trust the buy-in of the owners, because they're (obviously) biased. Also, we can't trust the opt-in of the visitors, because so many of them don't opt-in. So the question becomes, what sources of information do we have?

    I don't have an answer to that question. And, based on the lack of third-party ratings systems other than Alexa, I don't know that anyone else has that answer, either.

    --

    Reality has a conservative bias: it conserves mass, energy, momentum...
    1. Re:Asked and answered by Mandrake · · Score: 4, Interesting

      On larger sites, doing things like collecting / reading web site logs (like your apache log files) is completely unrealistic. We don't even have them turned on here anymore, because they generate so much disk i/o and flood so much disk space (each of our web heads when we last had logging enabled over a year ago produced over 8 gb of apache logs every day - multiply that times 30 and that's a hell of a log parse every single day...) - so we tend to gauge traffic more in megabits per second than anything else.

      I am not saying that Alexa is good for looking at traffic trends either - their numbers vary WILDLY from what our actuals are. Oddly enough, Hitwise does a much better job, but I suspect that is a lot of blind luck on their part as I think they take data in a similar fashion.

      I'm not sure I had a point, except that web logs aren't really feasible when your traffic crosses a threshold - I'm sure /. has similar logging problems.

      --
      Geoff "Mandrake" Harrison
      Some Random UI Hacker
    2. Re:Asked and answered by jamie · · Score: 5, Informative

      Hi Mandrake.

      Slashdot still logs every pageview (plus ajax). We drop them into MySQL and once a day run a data-massaging script on them then delete the oldest portion. We do have a pair of dedicated servers for this, but generally speaking the I/O is pretty low. It's very doable.

      One of the main reasons is detecting abuse in real-time (done by more scripts that run more frequently). I wrote a journal entry about one of those scripts, a while back.

    3. Re:Asked and answered by Mandrake · · Score: 2, Interesting

      I might take this up with the next generation of a system we're working on here potentially, if you guys don't have a problem - we have a workaround system going live shortly that does a certain amount of logging via syslog to dedicated syslog hosts (god bless syslog-ng) but we don't look at every pageview in order to lessen the load, we look at and log specific events (ones ripe for abuse - payments, signups, email, etc).

      -Mandrake

      --
      Geoff "Mandrake" Harrison
      Some Random UI Hacker
    4. Re:Asked and answered by Mandrake · · Score: 2

      We DO have visitor logging, we just don't use apache log files, and don't necessarily log every single action a user takes. And the site in particular I'm talking about runs at over 100 mbit/sec outbound traffic on slow days (considerably higher during peak traffic times on busy days).

      -Mandrake

      --
      Geoff "Mandrake" Harrison
      Some Random UI Hacker
    5. Re:Asked and answered by araemo · · Score: 2

      Similar database-backed systems could significantly reduce the amount of data generated if the apache log output was parsed into events between websites/servers, pages/documents, and actions, then you could reduce the incremental log data per page view to the minimal necessary, and still keep high granularity logs.. or you could record it all, up front, and then throw away all data that matches(or doesn't match) x before committing it to long term storage.. like, say, visitors who show up and don't ever log in, and search engine crawlers.. just note the # and types, and thats all you need to know.. vs. saving the entire visit history of anyone who actually bought anything, for troubleshooting/CYA/abuse prevention.

      Sure it would take some extra CPU, but as others noted, dedicated syslog hosts aren't unheard of either. ;) (I'd be really curious to see if this has been done already somewhere.. otherwise it might make a marketable product.)

  24. Re:whine, whine, whine by Erskin · · Score: 2, Interesting

    This is no different than the Nielsen ratings

    I'd argue it is rather different. TV is one way. Your television browsing habits are slightly less revealing than say, your banking activities or the blog entries you post.

    Also, Alexa claims to give you some value in exchange for letting them piggy back on your browsing. Nielsen is more public and more respected. This helps mitigate the sampling problems.

    Suck it up and find a better metric for your boss.

    If his "boss" (or any of the other scores of people who accost him about the popularity of websites) would let him pick the metric, he wouldn't have this problem.

    The point of the article is that he has to defend someone else's choice of metric.

    Or perhaps, the point is more of an "Ask Slashdot" sort of thing...

    As in, "Hey all you /. geeks, what's a better way to do this?" Taco's comments on the flaws in Alexa's system and Control Group's comments on some of the particular challenges against this demographic in general support that.

    Heck.. it seems like an interesting enough problem to me, but then again, I don't have a sig like yours:

    /.: "Anti-Microsoft Rants, Apple and Google d*ck sucking." Pathetic.

    If you hate it that much, why are you hanging out here?
    (Sorry, I really need to stop feeding the trolls...)

    --

    Erskin
    geek.

  25. The Rant and the Slashdot problem. by LWATCDR · · Score: 3, Insightful

    Slashdot is an extremely popular website with great demographics. It should be a huge money maker but it probably under performs.
    It doesn't show up all that well in Alexa because very few people that go to Slashdot use or would use the Alexa toolbar.
    It probably doesn't show up all that well with the advertisers because Slashdot readers are technically very sophisticated.
    What percentage of Slashdot users are blocking the ads on Slashdot? 80%? Slashdot should be the "Myspace" of the technical crowd. Heck it had the friends list long before Myspace was around. We have our Journals "aka" blogs so yea it is a little Myspace full of bright people with money to spend. But it doesn't make that much money. Slashdot should be worth many millions but it isn't. The real problem isn't Alexa but how can Slashdot live up to it's potental for that evil word. Profit. After all I am sure the Slashdot crew would like to make the big bucks.

    --
    See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    1. Re:The Rant and the Slashdot problem. by heinousjay · · Score: 4, Funny

      There's no better path to profit than courting an audience of people who's explicit goal is to destroy the monetary value of software.

      --
      Slashdot - where whining about luck is the new way to make the world you want.
    2. Re:The Rant and the Slashdot problem. by LWATCDR · · Score: 2, Interesting

      The church of RMS probably does have something to with it. Really is a shame because a lot of people on Slashdot buy a lot of software and hardware. I think part of Slashdot's problem comes from using Doubleclick to serve adds. What Slashdot user doesn't have *.doubleclick.net* in their ad blocker?

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
  26. Re:Do it to ourselves, and that's what really hurt by RetroGeek · · Score: 2, Interesting

    That is, on the one hand, we're a fantastic demographic to succeed with, but on the other, we're a tough nut to crack.

    And add to this mix that we collectively HATE advertising. So we all use ad blockers, flash blockers, script blockers, image blockers, and anything else we can find which reduces or eliminates advertising which gets in the way of reading the content of a web site.

    So even if we do get "counted" and the advertisers can determine what it is that we browse, the current method of "in your face" ads will quickly push us towards a way of either blocking the ads, or simply not going there any more.

    And I DO click on ads, but only if they are:
    - NOT in the way of the content
    - NOT blinking, flashing, moving
    - NOT trying to distract my eye towards them

    If ANY of the above happen, I am gone from the site, and will NEVER go there again.

    (Hey, this is my 1,000th post. Woo Hoo!)
    --

    - - - - - - - - - - -
    I am a programmer. I am paid to produce syntax not grammar. Deal with it.
  27. Re:Rant as news by sinner6 · · Score: 4, Funny

    Yes the reason digg has a higher page rank on alexia is because the average digg user is almost universally less technically savy then the average slashdot user. No I am not being sarcastic, they are dumber.

    A slashdot debate on bush and the war, for example, will use complete word and sentances and sometimes include facts. A crazy rant it might be but a crazy READABLE rant.
    The same debate on digg...
    bush = leet haxor
    STFU, WAR IS BAD, OBAMA 08 WOOT.

  28. Re:Do it to ourselves, and that's what really hurt by Control+Group · · Score: 3, Insightful

    Sure, but that all presupposes you've already bought ad space on the site in question. When you're trying to select which web sites to purchase ad space on in the first place, you don't have access to any of those metrics. If we were talking about a handful of key sites, that wouldn't be a problem - test the waters, go with what works.

    But given the huge number of web sites out there that run ads, you need some way of doing an initial selection of which ones to pay. Hence Alexa.

    --

    Reality has a conservative bias: it conserves mass, energy, momentum...
  29. MOD PARENT UP by bluej100 · · Score: 2

    Quite right. Not Alexa's fault.

    1. Re:MOD PARENT UP by neoform · · Score: 2, Insightful

      It's not alexa's fault that they're logging in as you and spidering pages that robots.txt says not to spider? Seems to me that it's very much their fault.

      --
      MABASPLOOM!
  30. Re:Do it to ourselves, and that's what really hurt by Opportunist · · Score: 3, Insightful

    As a statistician, I can reassure you that the only thing that's worse than no data is flawed data. When you have no data, you know something is wrong and you start correcting that. When you have flawed data, you don't. Instead you use that data and build on it, never knowing that what you measure, calculate and estimate has nothing to do with reality. In other words, it can be dangerous, to your job and the company you're working for.

    Imagine the (flawed) data you have tells you that almost 100% of the people visiting your geek-gadget page are fans of some rock group. Why? Because they use a proxy that was written by some fan of said rock group whose proxy subtly alters the meta information sent by your browser to tell everyone you surf to how much you like said rock group. You analyze it and invest heavily into marketing crap from said group, hoping that your customers will buy it since they all appearantly love that group.

    Result? Big desaster. Nobody buys it. Nobody even knows that group. They just all used the same proxy/plugin/younameit, not even knowing that whoever wrote it wanted to advertise his favorite band.

    --
    We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
  31. Re:Rant as news by Random+BedHead+Ed · · Score: 5, Funny

    You're telling me - rants drive me absolutely nuts, especially on this site. They don't make good reading, they pointlessly waste your time, and they use up valuable screen real estate that could be occupied by other, more interesting stories. The methodology behind rants us usually utterly broken but, for reasons unclear to anyone, are regarded as 'postable material' on all too many sites. I mean, let's not draw the line at Slashdot. Rants show up on:

    • Slashdot
    • Digg
    • Kuro5hin
    • Wired
    • People's stupid blogs
    • ... and like a zillion other sites I have to put up with.

    That we obviously need to abandon rants is clear, because they're almost always pointless, but there are so many of them these days that it gets to the point where the only metric you're using to compare sites is the quality of its rants. This is entirely flawed and meaningless, and leaves me wanting a stiff drink. Still, don't get me started on their frequency on /. You're all Slashdot readers, most of you just go ahead and prove my point anyway.

    So say you go to some random site and end up reading a rant. What have you learned. After you close your browser, are you any more complete as a person? Have you grown intellectually. Let me think: no ... no. I'm not some some expert on rants and why I'm writing about them is very confusing, but I think I have as much to say about the dumb things as anyone. And if that bothers people, at least I got the point across.

    Here's the problem: rants don't work. If you RTFA, and start with a place I'm very familiar with (namely Slashdot) like a quarter of you write rants anyway. And that's not even talking about the fact that any rant, and not all posts are rants, is going to take up people's time and not get modded very well anyway. How many Slashdot readers would mod a rant up? You either? Big surprise. Because of who we are, and what it is, our population will self select out of consideration.

    Did you know rants can get posted by ANYONE? How about Anonymous Cowards? Now personally I'm glad of that, free speech and all. But anyway, those are my (heavily edited) thoughts on this.

  32. Pfft, screw that. by oGMo · · Score: 2, Interesting

    If digg is "beating" slashdot, let it win. Maybe the YouTube popularity blog can suck away the idiots from slashdot.

    --

    Don't think of it as a flame---it's more like an argument that does 3d6 fire damage

  33. Blame anything but your system? by a16 · · Score: 2, Insightful

    I don't think that your story is a very good indicator of how rubbish Alexa is, it just highlights issues with your own system.

    This is why you shouldn't use HTTP GET for 'delete links'. Anything that changes content should be POST, which will stop bots crawling your site just by following links from breaking things. We have standards for a reason..

    As for alexa crawling your site as a logged in user, what? As far as I know the toolbar itself doesn't do any crawling, only reporting. Maybe it was providing links to Alexa that later got indexed, but if they were properly secured then you wouldn't have any issues. The fact that you seem to be relying on a robots.txt for security indicates bigger issues. The only time I've heard of a 'Toolbar' doing this kind of thing is when Google released their proxy service (which they later withdrew), as it automatically preloaded all pages - and again poorly designed pages using GET to modify data encountered problems just like yours.

  34. more specifically, by everphilski · · Score: 2, Informative

    trackware, not spyware, from your link: "is a program that installs a toolbar and gathers Internet browsing and search information." which is EXACTLY WHAT IT IS SUPPOSED TO DO in order to aggregate site popularity.

  35. Re:Do it to ourselves, and that's what really hurt by Control+Group · · Score: 3, Insightful

    That's true if the person using the data is unaware that it's flawed. But an educated decision can be made to use data that's known to be flawed, if one evaluates what those flaws are, and what they'll mean to whatever it is you're doing.

    In fact, as I think about it, I'm not sure "flawed" is the right word. The information is incomplete; whether that's a flaw depends on whether or not you recognize that you don't have all the information.

    I think that assuming all the people using Alexa rankings to make purchasing decisions are stupid is misguided. I think it's a much safer assumption that the distribution of stupid, average, and intelligent people among that population is fairly close to that of the population at large. Many of them are making decisions based, in part, on having information that they know to be incomplete, which they judge to be preferable to making decisions based on having no information.

    --

    Reality has a conservative bias: it conserves mass, energy, momentum...
  36. Re:Rant as news by sherpajohn · · Score: 4, Funny

    My irony meter just broke, and you owe me a new one.

    --

    Going on means going far
    Going far means returning
  37. Stop whining. Learn how to manage your boss. by gru3hunt3r · · Score: 5, Funny

    Let me save you some breath, I deal with non-technical small online business owners all day, every day, and I have for the last 7 years - they are obviously concerned with Alexa rankings.

    I *HAVE* been telling them that the stats are bullshit, not only for the reasons listed above but a few others - but eventually I gave up and developed a better strategy:

    Don't bother explaining highly technical concepts to a monkey, it frustrates you and annoys the monkey.

    If your pointy haired boss wants your Alexa ranking to improve I would suggest you:
    1) Call a meeting, invite as many department heads as you can.
    2) Make the problem your own, and phrase it as *MASSIVE*, *DIRE*, *EXTREME* (e.g. if we don't fix this, we could all be out of a job soon)
    3) Suggest IMMEDIATE ACTION be taken, suggest hiring an offshore team of workers (China $0.37/hr) to install the Alexa toolbar and surf around your site.
    4) Recommend that the company consider an immediate payout a Ukranian hacker with mob ties named "Ivan" who will pwn machines and install alexa and then randomly pop your site on his botnet for a reasonable fee.
    5) Finally tell them that bribes to key employees in Alexa may be necessary - tell them you may have a contact and tell them to be ready to authorize six digit sums of money in a 24 hour period if necessary. [this can be useful for other reasons]

    Trust me - as soon as the first mention of money (and specifically who's budget it will come out of) is made the general attitude toward how important Alexa is will change. They'll backpedal, claim you're being overly-proactive. They'll produce some rant they found on a website called dot-slash saying how Alexa rankings aren't important.

    Tell them it's all propaganda, proceed to ignore whatever they say -- pronounce your undying love for Alexa - and it's relevance to the web.
    DEMAND THEY RESPECT YOUR AUTHORITY.
    IDENTIFY YOURSELF AS THE BIG DOG OF TECHNOLOGY.
    ASK WHO ELSE GRADUATED FROM DEVRY LIKE YOU DID?
    WHO ELSE IN THE ROOM IS A CERTIFIED NOVELL ADMINISTRATOR?
    IF CHALLENGED BY ANYONE TAUNT THEM AND SAY THEY PROBABLY DON'T EVEN UNDERSTAND BIG "NETWORKING" CONCEPTS LIKE SECURE SOCKETS LAYER, TRANSPORT CONTROL PROTOCOL, AND .NET FRAMEWORK.
    Then proceed to tell them that (in your professional opinion) your company won't be able to recruit good people because of your poor Alexa ranking. Tell them that search engines will stop spidering your site, and eventually your traffic will drop to zero. Without a good alexa ranking your email will get caught in more spam filters and you'll appear on blacklists and phishing filters more frequently. That means the SSL locks won't show up on browsers anymore. This will cause packet loss on your routers to increase. If it's not fixed immediately it's possible eventually your domain won't even work if somebody enters it directly into their browser. ALEXA IS THE MASTER OF THE INTERNET THEY ARE ALL KNOWING WE MUST SERVE THEM WITHOUT QUESTION.

    ps> I *seriously* did have one customer who hired an offshore Indian firm to boost they're rankings (no bullshit) - feel free to mention that your competitors are already doing this, and the clock is ticking. WE NEED A DECISION NOW.

    The next topic: PAGE RANK (umm.. wash, rinse, repeat)

    1. Re:Stop whining. Learn how to manage your boss. by gru3hunt3r · · Score: 3, Funny

      Oh.. almost forgot to mention -- to respond to Page Rank

      First tell them the SEO consultant they hired is an idiot (did he graduate from DeVry and have his CNA? - I don't think so) and he is most likely trying to defraud the company and that they should stop payment on his check.

      Changing your page rank # is easy there are lots of articles on the web how to do it, but basically you can simply do it with Meta tags ex: .. if you want a page rank of 12 then just change the 7 to a 12 - it's easy.

      If they don't believe you they can look it up on their Inter-web. There are lots of websites which explain that Google's spider crawls meta-tags to index the site and determine page rank.
      (At this point their head will hurt from all the technical mumbo jumbo)

      In 60 days when it doesn't work, tell them that it's because your website is too slow and Google probably can't crawl it fast enough, use that to justify a OC48 to your desktop so you can make faster and more frequent site updates. Now yur l337 bcuz u can pwned newb's in PvP huh?

      When the OC48 doesn't work, suggest the problem could be that Google found out you're spending too much time with Alexa (a Google competitor) and traffic isn't being seen by Google's routers and so Google is penalizing you.

      At that point I suggest using similar tactics to Alexa.

      By the time you get a couple of those six digit payouts to bribe key employees in Alexa/Google, then you won't need to work there anymore. Leave and start your own company.

      Have FuN!

  38. Alexa ratings by evildogeye · · Score: 2, Interesting

    I have gotten numerous sites into the top 75k of Alexa ratings by simply installing the toolbar on a couple of machines and regularly browsing through the entire site. On the other hand, I have sites that receive 3000 unique hits a day ranked around 300,000 on Alexa. That being said, I still use Alexa all the time to figure out which sites are well trafficked, and I imagineit is far more accurate than the author is giving it credit for. If you eliminate obvious exceptions (sites that cater to SEO folk and sites that cater to certain audiences such as Linux users) I think you will find that Alexa makes for a useful although not 100% accurate tool.

  39. Quality, not Quantity that matters by DigiShaman · · Score: 4, Insightful

    It's the quality, not the quantity of your audience that matters. Despite the occasional trolling and flaming that goes on at Slashdot, it still uphold its audience as the most informed and highly intelligent. I can't say that for Digg.com.

    --
    Life is not for the lazy.
  40. Re:Rant as news by IceCreamGuy · · Score: 3, Interesting

    I dislike pointless rants as much as the next person, but I feel like you can at least give a little credence to posts like this. I'd imagine it's extremely frustrating dealing with this type of thing; the general reader doesn't really have any idea that this is a problem (I've been reading /. on an hourly basis for the past 4 years, but it' always possible that I just never paid enough attention to hear about this), but apparently it's something that he has to deal with on a regular basis. If he's making a post on the main page, it's obviously something that he feels is a serious issue and he's looking to the community for support and feedback. Rag on Slashdot all you want, but if you're posting on this then you obviously read it a good deal and hopefully get useful/interesting information from it. Why can't a founder of such an excellent (in my opinion) site complain and ask for feedback on an issue that's obviously important and causes serious problems in a way that most likely the users and the admins never anticipated or know how to deal with? Maybe if we all were running 640x480 or browsing with Lynx we could legitimately complain about the post taking up important news space, but in the majority we're not, and in addition Slashdot caters to such a wide variety of readers that you're never going to be interested in every single news post on the front page, even with the customizations. (if you are then I want your job).
    On that note, I don't actually have anything to say about the topic at hand, but then again, neither did the parent.

  41. There are alternatives by sitarah · · Score: 2, Informative

    Then don't use Alexa. You can measure traffic in ways beyond toolbars. Try the following:
    1) Compete. It is free. It uses toolbars AND panels AND isps. It's not that accurate compared to Comscore.. but maybe Comscore is wrong.
    2) Buy something. Comscore uses a panel method with a careful demographic spread so they can extrapolate from their sample with a small percent-error.
    3) Buy Hitwise for percentages. It doesn't give you unique visitors, but it can give you comparisons and ranks and whatnot. They lay on top of ISPs and use a few panels. It is 20-30K for a year or so.
    4) Wait awhile until the IAB's audit leads to some common definitions and standards among the aforementioned companies. The Interactive Advertising Bureau and Media Rating Council are auditing Nielsen and Comscore to make sure there is more transparency into what defines all these metrics, how they are counted, and how they should be counted, forever after. In a few years, there might be some consistency in the industry, which will at least stop you from comparing apples and oranges once you get beyond over-counting SEO spammers.

    If you are concerned about the demographics and unselfconscious web surfing, you need to go with a company that looks at ISP data. That's right, everyone -- your service contracts with larger ISPs allow them to anonymously watch your traffic and sell it to companies like Hitwise with your demographic information. Suddenly, the 35-54 white male demographic with a 80K income in the south can be fully represented in the balloon-popping video site genre, until they start hiding behind a proxy. Because it is anonymous, it is even better than a panel, because they don't know they are being watched and don't change their behavior.

  42. Re:Rant as news by JWSmythe · · Score: 3, Insightful

    There's a neat thing in journalism. Editors retain the privilege of being able to commandeer any space they want in their publication, and say just about anything they want. In the format of Slashdot, the editorial would take the top most position on the page, until a newer story filled the position.

        I have been known to do the same thing on my site. It may be a "thank you" to our users. It may be a birth, death, or wedding announcement. It may just be that a particular topic has infuriated me to the extent that I needed to put my opinion in big bold letters on the front page, because no matter how much we may report on the topic, people still don't have a clue about the meaning.

        If Cmdr Taco had posted a news story on the poor metrics used by Alexa, would that have received the same attention that his own personal account did?

        Unfortunately, he's echoing what many of us already feel. Boss type people feel the need to rank high with Alexa. If the ranking goes down for any reason, they want it brought back up. Even on some of the lesser technical sites, discussions start about spyware, and people start removing these utilities. When that happens, the score for those sites drop, and the score for AOL.com and Disney.com go up. (when's the last time you de-spywared the kids computer?)

        I know from reliable information, that my news site is read heavily by those in the intelligence services around the world. I sincerely hope that they wouldn't have the Alexa toolbar on their machines. Most of our users are very aware of what's happening around them. They're the ones that are careful to keep their machines clean of viruses and spyware. That leaves us with the random users who follow links from other news sites, or find us in search engines. Maybe they'll stick around. Maybe they'll even learn something.

    --
    Serious? Seriousness is well above my pay grade.
  43. Re:Do it to ourselves, and that's what really hurt by Strilanc · · Score: 4, Insightful

    That won't work well with ads in general, because your desires change over time. For example, you'll be interested in car ads only when you're considering buying a new car.

    Also, people like me would just vote every ad down until we didn't have to see anything. If I want to see advertisements for a product I'LL GO LOOK FOR IT.

  44. Advertising is a huge crapshoot by Dracos · · Score: 4, Insightful

    Always was, always will be. After decades, there is no agreed upon methodology for tracking the effectiveness of marketing dollars in the real world. The internet should make it easier, right? Perhaps, until people learn how to filter the internet. Doubleclick never sees me, because I have

    0.0.0.0 *.doubleclick.net

    in my hosts file, along with 37,000 other crap sites. I also add "*urchin\.js" to my custom filters in FilterSetG, so AdSense doesn't see me. I suspect other Slashdotters take similar measures.

    If a good click through rate on a banner ad is less than 1%, and only about 1% of clicks result in a sale, then the value of that banner to the advertiser is only .01% of it's cost (yes, I know AdSense works differently, but it has its own pitfalls). Pathetic, isn't it?

    It makes you wonder how poorly traditional media ads actually perform.

    Banner ads, I'm pretty sure, are the first time advertisers have ever been able to measure the returns on ad dollars. Some company spends $20k for a full page ad in a magazine, how much of that came back in sales? No one knows. So just to me sure they don't lose sales, the company continues to buy ads, following some rough percentage of revenues. Demographics is the closest thing marketers have to concrete data... it basically says not to buy ads in Ladies' Home Journal if you're selling vintage car parts. Even then, demographics measures potential returns before the fact, not actual returns after the fact. So, advertising is a wild goose chase based on assumptions, and no one does, or can, really know what's going on.

    The internet should be a wake up call for advertisers to the fact that their marketing budgets are being overinflated by... (wait for it) the ad agencies and marketing firms. Sadly no one will realize this, because the foxes are in charge of the henhouse, and claim everyone will fall to ruin otherwise.

    Generally, people don't want the crap in the ads, and would rather not even see the ads. Horrible conversion rates prove this. The scariest part of Minority Report, other than the nanny-state concept of "pre-crime", is the level of advertising present everywhere in the film, targeted at individuals with laser-like precision. It got that way because the public allowed it to happen.

    The simplest way to fix advertising is to remove all imperative and presumptuous statements from them. No more "Call now!", "You need...", "But wait, there's more!" obnoxious mind games. I'm not calling, I don't need your shit, and I'm not waiting for you to yell at me some more.

  45. Re:Do it to ourselves, and that's what really hurt by badasscat · · Score: 3, Insightful

    As a statistician, I can reassure you that the only thing that's worse than no data is flawed data. When you have no data, you know something is wrong and you start correcting that. When you have flawed data, you don't.

    This is a huge assumption that I'd say is incorrect more often than not.

    Your entire argument as it stands now presupposes that the advertiser doesn't know the data is flawed. But what if he does?

    My company buys lots of web ads. We use Alexa as one of our data sources (not the only one) to determine ad buys, both because it's free and because in our experience, its data is no more or less accurate than that of paid vendors like Nielsen. Do we expect 100% accuracy? No. Do we think we can learn anything if, for example, it tells us that two directly competing sites have traffic that's different by about 200% in every metric? Probably.

    Buying ads is not an exact science. It doesn't really matter if we get accurate traffic down to the individual click. All we're looking for is relativity - a site's size and reach compared to its competitors. We look at the sites themselves, we look at Alexa and we look at research that we commission and pay for. Usually these sources all agree and we go ahead and buy. In the event that they don't agree, we use our own critical thinking and our own judgment to determine what to believe - that is part of any marketer's job, after all.

    It seems to me that this whole article here is missing the point. Alexa's a tool. A free tool. It is useful at what it does, but it is not, nor was it ever intended to be, some sort of accurate measure of site statistics for the entire internet. Nobody who uses it as part of their decision-making process is using it that way.

    I think this is a case where somebody looked at Alexa, figured out that it wasn't perfect, and therefore determined that it's utter crap. That's basically what your argument boils down to also. But the point is we don't need perfection, and we don't expect perfection, and this lack of perfection is taken into account in our decision making process. We're not flying to the moon here; we're buying ad space. It's something of an organic process regardless of how good your data is.

    If you're talking about somebody using Alexa for their own site, then that's just ridiculous. Even cheap hosting accounts (like I have for my personal site) come with their own log-based stats, and if not, there are plenty of free services like Statcounter out there. I don't think this is what many people use Alexa for, though; it's used more by small to mid-sized companies looking for sites on which to buy ads, or by curiosity seekers who just want to see how big their favorite sites are. I would think most sites would know what their own internal numbers are one way or another, without Alexa.

  46. Re:Rant as news by FST777 · · Score: 3, Insightful

    When talking about technological things like page-ranking and Alexa's use on that, yes. Yes they are.

    It's not that they are dumb in the wide version of the word, but in the techfield, Digg is arguably "dumber" than Slashdot. Try the same argumentation on Slashdot vs. MySpace.

    When I talk about my hobby or profession, I like to single out the 99% that doesn't understand a word from what I'm saying.

    --
    Free beer is never free as in speech. Free speech is always free as in beer.
  47. DoubleClick by Mike_K · · Score: 2, Insightful

    CmdrTaco wrote:

    Equally perplexing is the accounting of iframes. Let's look at someone like double click's alexa rating. Now it's hard to say, but I don't think I've ever visited their website. Have you? But according to Alexa, they have nearly a 1% share of the internet. I'd tend not to believe it...

    That's not surprising to me at all. I don't think this is because of all the iframes that pop up on pages, or they would have a much higher percentage than 1%. I think it's actual ad clicks. When you click an ad, you go to a doubleclick link which will redirect you to the advertiser's page. If all those ad clicks are counted as actual traffic, 1% is actually a very believable figure.

    And I've never heard of Alexa until now :)

    m

  48. Alexa by evildogeye · · Score: 2, Funny

    If you really want good Alexa ratings, just put a link to the toolbar at the top of slashdot.org. Soon you'll probably be in the top 20.

  49. Firefox... problem solved by jadm · · Score: 2, Interesting

    Sparky. It's called Sparky. http://www.alexa.com/site/download

  50. It's the paradigm that's flawed, not Alexa by macraig · · Score: 3, Insightful

    This is for CmdrTaco and anyone else who wants to read it.

    Dude, it's the paradigm that sucks, not Alexa per se. Consider Nielsen ratings: would you or any self-respecting Slashdotter actually be so foolish as to agree to be a "Nielsen family"? I doubt it. It's the same dynamic at play. I blogged about the relative stupidity of Nielsen families in particular a while back; those people are ruining my ability to enjoy quality programming like Firefly, Space: Above and Beyond, Keen Eddie, and countless others because of their mindless plebeian tastes.

    These are also the same people who often cause unreasonable pricing for consumer items, because they're too stupid to know when to vote with their dollars and just say "no". "$70 for a set of warmed-over LucasFilm Star Wars films that already turned a profit three times over? No problem, I simply *must* have them!"

    As a result, manufacturers set prices based on this same mindless demographic; those of us who are "smart" consumers, who could wrangle a better fairer price, are dragged along for the ride kicking and screaming.

    That's kinda what has happened here: you (CmdrTaco) are being dragged along kicking - and screaming - by all the Alexoids, and you don't like that any more than I like having Firefly yanked off the air.

    I'm quietly of the suspicion that national and especially online advertising is only a fraction as profitable as corporations think it is. I suspect if someone could do a truly objective cost-benefit analysis of mass advertising, like car commercials on TV, we'd find that it's actually costing money that is never rewarded in equivalent sales, and for which we're all ultimately footing the bill in the form of higher prices to pay down all that pointless advertising.

    Solving the "Alexa dilemma" just might require eugenics or some other speciation event.

  51. dealing with http logs on busy sites by ger · · Score: 3, Interesting

    At W3C we log almost everything as well, and we end up with way too much data as a result.

    But we use the logs to detect and prevent certain classes of abuse as well (e.g. too many requests in a short time interval or re-requesting the same resources over and over), and we also want to be able to track trends over time, so we have been reluctant to just throw that data away.

    I have a plan that I have yet to implement, which is to log only 0.001% of the requests for certain very popular resources (e.g. HTML DTDs and valid-HTML icons), which would allow us to monitor trends without logging tens of gigs of data per day; we'd just need to compensate for it when calculating stats later.

    Then I planned to monitor for abuse by also logging every request to a script that watches for abusive traffic patterns, an easy adaptation from the current script that wakes up and skims the logs every 10 mins.

    (in your journal entry, when you say you are MD5ing IP addresses for privacy reasons, are you adding a random bit of data to the IP address before calcuating the MD5? If not it's pretty easy to find out which IP address corresponds to a given MD5 sum.)

  52. Re:Rant as news by plover · · Score: 3, Funny

    No problem. You can use my goldy meter or my bronzy meter. They're just like your irony meter, only they're made of gold and bronze.

    --
    John