Why Good Data Can Be Hard to Find Online
WSJdpatton writes to mention that Carl Bialik has an interesting look at why good data can be hard to find, much less understand, online. He cites a couple of examples, both Google's first-quarter performance numbers and Alexa's revamp of their number-tracking process. "Now Alexa is incorporating other sources of data -- though it says the prior ranking 'wasn't wrong before, but it was different.' Some sites saw big changes in their rankings following Alexa's move: The tech blog TechCrunch said it fell far from its prior position in Drudge Report territory (rarefied air in Web-traffic terms). On Friday afternoon, Drudge Report ranked 545th, compared with TechCrunch's ranking of 1,784th, according to Alexa's new math."
This isn't exactly on topic, but I think you should give it a read before you make a final opinion on what the article is trying to stay.
Another example besides Alexa of "readjustment" is Hitslink. Last November, they revised their figures for OS share for March through October 2007. Linux went from a reported .81% share in October, to .50%. They made only a brief allusion on their site to filtering out "unrepresentative" hits from their data. Recently, they again revised their Linux share for January 2008, from the original .67% to .64%. Even though Hitslink seems to have trouble deciding how many Linux users there are, that doesn't keep people (like Westlake, who keeps posting Hitslink numbers on Slashdot) from citing them.