Slashdot Mirror


Why Good Data Can Be Hard to Find Online

WSJdpatton writes to mention that Carl Bialik has an interesting look at why good data can be hard to find, much less understand, online. He cites a couple of examples, both Google's first-quarter performance numbers and Alexa's revamp of their number-tracking process. "Now Alexa is incorporating other sources of data -- though it says the prior ranking 'wasn't wrong before, but it was different.' Some sites saw big changes in their rankings following Alexa's move: The tech blog TechCrunch said it fell far from its prior position in Drudge Report territory (rarefied air in Web-traffic terms). On Friday afternoon, Drudge Report ranked 545th, compared with TechCrunch's ranking of 1,784th, according to Alexa's new math."

10 of 39 comments (clear)

  1. Alexa? No. by Slashdot+Suxxors · · Score: 4, Informative

    This isn't exactly on topic, but I think you should give it a read before you make a final opinion on what the article is trying to stay.

    1. Re:Alexa? No. by jd · · Score: 5, Insightful

      The article and the slashdot story seem to say the same thing - the numbers produced are just numbers out of a hat. They don't represent anything meaningful and indeed can't because the participants are self-selecting and therefore not a random sample of the population. This is obvious and always has been. The popularity of a site (or a TV show or anything else) cannot be measured by any simple means, if it can be measured at all.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    2. Re:Alexa? No. by TubeSteak · · Score: 4, Interesting

      The article and the slashdot story seem to say the same thing - the numbers produced are just numbers out of a hat. They don't represent anything meaningful and indeed can't because the participants are self-selecting and therefore not a random sample of the population. Even with a random, statistically relevant sample size... the saying "lies, damn lies, and statistics" still applies.

      The popularity of a site (or a TV show or anything else) cannot be measured by any simple means, if it can be measured at all. Tivo & other DVRs would suggest otherwise.
      --
      [Fuck Beta]
      o0t!
    3. Re:Alexa? No. by Firehed · · Score: 4, Informative

      Maybe relative tracking can't be done by simple means since it requires participation on everyone's part, but absolute local tracking is trivially easy on any server that supports server-side scripting and has some sort of database access. A couple lines of code at the bottom of your page to insert a new row on a page load and you've got nearly perfect visitor logs that can easily go beyond your standard server logs.

      Again, useless for relative popularity unless you have everyone's data. But it still tells you how popular your site is which is great for ego boosting and advertiser stats if nothing else.

      (I'd suggest that Google Analytics is going to be a lot more useful in the long run and at least has the potential to provide relative data in addition to the absolute, but anything that relies on client-side scripting is going to give less accurate numbers since clients can disable or screw around with scripting)

      --
      How are sites slashdotted when nobody reads TFAs?
  2. 70% of good data...? by JadeAuto · · Score: 5, Funny

    I read online somewhere that 70% of statistics online are made up. This article seems to prove the point. 4 out of 5 slashdotters agree! ;)

    1. Re:70% of good data...? by evanbd · · Score: 4, Funny

      Four out of Five slashdotters? You must be mad. The only things slashdotters can agree on are that they want to marry CmdrTaco and what concert to see on the honeymoon.

  3. Wow, astoundingly obtuse by zappepcs · · Score: 5, Insightful

    Just observing the Internet and then reading this ... just wow.

    Good data is HARD to find ANY FUCKING WHERE, never mind limiting your search to just online. Seriously!

    News online? read the same story from 8 sources, form your own opinion. MSM sucks worse.

    Scientific data? Well, unless it's peer reviewed, you know it's probably suspect and need to verify it with other data. Damn, even peer reviewed scientific data should be compared to other data these days.

    How about Encyclopedic data.. There is wikipedia, but make sure to corroborate the data, right?

    Read it in a blog? Check the data before you make up your mind.

    Hmmmm this sounds a lot like trying to find good data before the Internets were active. Damn, all that data is proffered up by humans... Humans are not infallible so I'm guessing that data provided by humans is going to be a bit 'not infallible' also.

    Where does the assumption that data online should be good data come from? wtf?

  4. No, really. by v(*_*)vvvv · · Score: 4, Funny

    The company tracks the Internet habits of users of its browser toolbar ... These rankings have long been criticized ... because Alexa users may not behave like the Internet as a whole. Ya, who in the world uses the Alexa toolbar!?
    1. Re:No, really. by choongiri · · Score: 4, Insightful

      Nobody here. That's the point.

  5. A Good Date by GalacticLordXenu · · Score: 3, Funny

    I initially read this as being, "Why a Good Date Can be Hard to Find Online". Hell, I could have told you that! But alas...