Slashdot Mirror


On Counting Website Traffic

Logic Bomb writes: "The San Francisco Chronicle has an interesting article about measuring website traffic. This is kind of an obnoxious issue, but it means everything to commercial websites seeking investors. Apparently the figures reported by the sites themselves through analysis of server logs are often much higher than the ones given by firms like Media Metrix (whose numbers I see all the time in articles from Cnet and the like). The basic dispute is over whether sampling, a la Nielsen, is appropriate for the web. It seems counterproductive to purposely use an innacurate statistical measure when exact counts are readily available, but I can't imagine many things easier to fake than a server log. Anyone have a good idea about how to approach this?"

16 of 145 comments (clear)

  1. Fraud by Kaa · · Score: 3

    Well, falsifying server logs in order to get better rates for banner ads would probably count as fraud, which happens to be a criminal offense in the US. A couple of show trials followed by public hangings should solve this little problem.

    Besides, banner ads are typically served from a server NOT controlled by the company which own the page. So people like DoubleClick know for sure how many times their ad was ignor^H^H^H^H^Hseen.

    Kaa

    --

    Kaa
    Kaa's Law: In any sufficiently large group of people most are idiots.
  2. Faking stats by Fervent · · Score: 3
    I can't imagine many things easier to fake than a server log.

    Are you kidding? When I worked at my last internship the boss would take the server stats from WebTrends, plop it in a Word file (to look good for investors) and then sometimes "moderately improve" some of the stats before printing the document.

    Fact is, most investors don't get a verbatim server log with all the technical "mumbo-jumbo". They get a simplified version with only the information the CEO wants them to hear.

    --

    - I don't care if they globalize against free speech. All my best free thoughts are done in my head.

  3. Re:Why bother? by Mtgman · · Score: 4

    If someone is willing to take the hosting site's word at face value with regard to eyeball real-estate, then I've got some banner ads (and a bridge) to sell them.

    And this is the really sad part. The information age has created a new type of cyber-criminal. The false information broker. Society is moving away from products and building multi-purpose machines. As a whole were're more service oriented than we used to be. This means all our assets and business transactions are on paper. Nothing tangible is being exchanged. And typically we have such a high volume of data being transferred that it can't be checked for 100% accuracy. I signed up for one of those "saver" cards at a local grocery store(part of a national chain) and totally faked the information on the signup sheet(I get enough spam as it is, thank you very much) No one caught it, even though an application with an address of 1600 Penn Ave in Ft. Worth, Utah with a completely made up Zip code and a Texas DL number showing up at a store in Tennessee _should_ have raised an eyebrow or two.

    So now we have the buyers and the sellers. A buyer can't always trust a seller and a seller can't always trust a buyer. Enter the middleman who keeps both parties honest. Am I the only one saddened by the necessity of a service like this?

    Steven

    --
    -- I have marked myself unwilling to moderate-- I don't have other accounts to artificially inflate the karma of
  4. Re:More importantly, demographics by crisco · · Score: 3

    The server logs don't tell you who is coming to the site. Sure, you know that 201.189.67.109 (completely made up) stopped here and you can even do a reverse DNS on it, but the advertisers that pay for banner ads and the corporate marketing types want to know how much disposable income is behind that IP and what they might spend it on. That is why DoubleClick and all want to track you and even correlate you with a name and address, that info lets them classify you and sell your eyeballs to the advertisers. Have you seen the higher prices that they get for targetted ads? Nearly double their normal rate last time I looked.

    --

    Bleh!

  5. Most Downloaded Woman by iCEBaLM · · Score: 3

    I was thinking about this when I heard on Entertainment Tonight about Guinness crowning the "Most downloaded woman on the internet". And when I heard her astronomical number of 800 million downloads I thought it was incredibly inaccurate. Every man, woman and child in the US would have to download 4 of her pictures. How does Guinness come up with the final numbers? Do they even check the logs themselves? Are thumbnails viewed on a page included in the final numbers?

    When I eventually went to her site (I can't even remember her name for gods sakes) she had almost no pictures on it of herself, lots of other girls however, I tried in vain looking for some of her and I was thinking to myself that the numbers were severely inflated.

    While this might be an "obnoxious" question I think a standard way of evaluating just how many hits and downloads a site gets needs to be determined, expecially for awards like the Guinness Book.

    -- iCEBaLM

  6. Re:Lies, damned lies, and proxies by technos · · Score: 5

    NOTE: By reading this post, you have agreed to run around the room which you are currently in, flapping your arms, and sqawking like a chicken.

    Okay, I did it. Unfortunatly, I was reading your post at the same moment my boss was entering the cube, and I've been fired. Under the terms of the 'technos' AUP (As amended September 12, 2000), and UCITA, you are hearby notified that you owe me $28,941,285.42.

    Referencing clause two of the AUP, this number reflects the sum of my maximum earnings potential until retirement age, as well as the cost of obtaining said employment (six years of college at a major University), as well as an additional 34% transgressive penalty and a 9% compounded cost-of-living increase.

    You have ten business days to remit the sum, in whole, or I will be forced to submit a class B lien request against both your holdings and those of your employer in the State of Maryland.

    Clause six clearly states you indemnify me against any legal malfeasance or action, so don't even try to get cuetsy with a countersuit. It has a binding compensation clause of $2,000,000.

    --
    .sig: Now legally binding!
  7. Web server statistics are NOT for marketing! by komet · · Score: 5
    Where the fuck does the idea come from that your should show your web server stats to marketing/sales people? Because current programs are really just some measurements of technical data, useful for planning server loads and Internet uplinks, but not for demographic data. PHBs want something like this:
    • Yesterday, 1308 people visited your site. Of those:
    • 183 weren't paying attention at all anyway.
    • 22 were your competitors.
    • 318 were poor college students drooling over, rather than contemplating buying, your products.
    • 139 were actually looking for pornography and left your site immediately.
    • 38 were webdesigners stealing your HTML code.
    • 133 were here to compare your prices with the competitors. Of those, 29 decided to buy your product.
    • 84 were in your target demographic, but were so stoned at the time that they didn't read your sales pitch.
    • 12 people actually bought something online.
    • 18 people liked your product and went out and bought some offline.
    • Of those 30 people who bought something, 28 sent the URL to a total of 56 friends to show off what they had just bought. Of those friends, 3 subsequently bought something.
    Ok, so where's the software which can get that data out of your server logs?
    --
    Any technology which is distinguishable from magic is not sufficiently advanced.
  8. From the advertising point of view... by oliverk · · Score: 3

    I work for a major ad agency that produces the full spectrum of work, online banners and applications, broadcast and print spots, etc., so really from our perspective its about comparable measurability. We deal is a world where the media mix can contain any number of mediums, and right now the online space is the most difficult to measure and justify to our clients. This isn't so much about what

    I come from a good (read: more than five years :) ) background in the interactive territory, and I've gotten pretty used to the issues of measurability on the internet. The reality is that, for those of us creating work online, we've gotten overly accustomed to the nuances of online and forget too often to explain it all over again. There's also no major player that will admit that measurability across sites and users is nothing more than a statistical crap-shoot. I don't know why none of them will admit this -- certainly the polling that's done by Nielson and the like is nothing more than statistical projections, and really it's a lot better to have something imperfect rather than nothing at all.

    In reality, our clients still really don't understand why these numbers are so different and then question our recommendations based on what they read. It challenges our reputation and affects the trust the clients typically feel in our creative or media teams. Broadcast and print, as well as the other "offline" mediums, really then have one big advantage: those mediums have been in use long enough that our clients no longer ask the questions of "how can we justify those reach numbers" or "sure I see what you're saying, but my other consultant says that you're only reaching half that audience with that commercial."

    So, maybe the challenge really lies with each of these "measurement" firms not admitting that they could be wrong. Maybe its that the sites that are polled are financially incented to inflate their numbers to justify acquisition or second-round financing. Maybe its that the technology exists to perfectly track a user's path anywhere, anytime but one of the first "features" in the browser was anonymity. Maybe it's the convergence of all of these different pieces at the same time (which is most likely the case).

    Sad. The interactive space has such opportunity to get around lofty advertising and blink-tag style direct marketing. But unless we can justify the funds, apportioned largely based on reach to the market, we won't end up with the type of experience marketing that actually ads value to those of us online.

    --
    ---- Please be nice in case my Slashdot karma ~= my real life karma.
  9. Problems with measuring traffic. by jd · · Score: 3
    Measuring web traffic accurately is a complex science, and not for the faint-of-heart. Why? Let's start with:

    • European users, especially, use web caches, rather than direct-through connections. So there isn't a 1:1 correspondance between server accesses and users.
    • Connection freezes, time-outs, etc, will end up showing more connections than actual users. (The user has to reconnect, which is a fresh "access".)
    • Framing, deep-linking, etc, will "smudge" the access count between any number of arbritary sites in an unpredictable manner.
    • Browser caches don't refresh on every access.
    • Dynamic IP allocation means that there is an n:n correspondance between addresses and users.
    • Network Flooding != Popular Site
    • Search Engines != Users
    • You don't control how the content is used. For all you know, Joe Bloggs, down the road, has linked his web browser to Internet Conference (a whiteboard from VocalTec that lets you send stuff via OLE to other machines)

    In the end, the only way to guague how many people have read your site is to place unique or unusual information on it, and then find out who knows it.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  10. Dealing With It Now by waldoj · · Score: 3

    I've got a problem with that right now. A site that I operate, nancies.org, serves up about 600,000 pageviews each month. But we're regularly credited by 24/7 Media (aka ContentZone) for just over 400,000. But they don't give two shakes for our logs, and say that we just have to trust them. That's like the U.S. government saying, regarding carnivore, "trust us."

    BS. So I applied to Engage (formerly Flycast) last night to get our ads through them. Are they any better? I have no idea. But I do know that ContentZone is screwing us over, and that's incentive enough for me.

    -Waldo

  11. three types by jafac · · Score: 3

    Gee, I guess somebody finally figured out what the third kind of lie is!

    I think that if you're investing in a web company, you should IGNORE the statistics. Go to the site. If it's lame, don't give them your money. If it rocks, go for it? How hard could that be?

    --

    These are my friends, See how they glisten. See this one shine, how he smiles in the light.
  12. Why bother? by Mike+Schiraldi · · Score: 4

    But when you advertise on the web, you can look at your web logs to gauge the audience - you don't need to trust their logs, or Media Metrix', or anyone else's.

    In fact, by looking at your own logs, you can say, "Well, Yahoo sends 10,000 people a day to my site, but only 10 of those people buy anything.. Meanwhile, Slashdot sends 1,000 people, but 500 of them end up buying stuff."

    So why are such ratings needed?
    --

    1. Re:Why bother? by Erasmus+Darwin · · Score: 3
      So why are such ratings needed?

      They're needed because they have to have numbers to show to their advertisers. An ad that's being viewed by 4 million people has significantly more value, and thus has a higher cost, than one that's only being viewed by 2 million people. If someone is willing to take the hosting site's word at face value with regard to eyeball real-estate, then I've got some banner ads (and a bridge) to sell them.

  13. Carnivore by Anonymous Coward · · Score: 4
    Doh!

    Carnivore is the answer. Let the feds provide accurate and unbiased information!

  14. Questions on making your own stats by 198348726583297634 · · Score: 4
    I've been put in charge of producing the stats for my company's websites. I'm using Wusage, which is plenty configurable, scriptable, very well-priced for its functionality, etc., and I've set up a number of exclusion-filters.

    What I'm blocking out so far is:

    our company's internal IP traffic

    images

    funky robots like Keynote-Perspective that the old webmaster had let loose on our sites.

    This gives us some numbers I have confidence in (even though they're 10x less than the numbers the old guy was producing through Webtrends), but I'd like to find out what others are doing for making their own web stats.

    Thanks,
    Steve

  15. Lies, damned lies, and proxies by Webmonger · · Score: 4

    Excuse me while I go "Grumpy old man". This is an old, old problem. It goes back to the days when I first started using the web. See "Why web statistics are (worse than) meaningless." It's an old article. That's the point.

    In short, spiders, proxies and caches make it impossible to be accurate in measuring traffic. But everyone else is affected the same way. So your relative stats are relevent-- they just aren't hit-for-hit accurate.

    What your server logs are really for is resource planning. They'll help you find out how much traffic your server is serving, which should help you plan bandwidth and hardware upgrades as needed.