Slashdot Mirror


On Counting Website Traffic

Logic Bomb writes: "The San Francisco Chronicle has an interesting article about measuring website traffic. This is kind of an obnoxious issue, but it means everything to commercial websites seeking investors. Apparently the figures reported by the sites themselves through analysis of server logs are often much higher than the ones given by firms like Media Metrix (whose numbers I see all the time in articles from Cnet and the like). The basic dispute is over whether sampling, a la Nielsen, is appropriate for the web. It seems counterproductive to purposely use an innacurate statistical measure when exact counts are readily available, but I can't imagine many things easier to fake than a server log. Anyone have a good idea about how to approach this?"

145 comments

  1. webmetrics by DarkbladePDX · · Score: 2

    On a serious note, I work at a fairly busy web site in the data warehousing/business reporting section. We simply don't have _time_ to fake enough server log entries to make it worth our while, we're too busy processing the _legit_ stuff and filtering out such non-reportables as crawler hits. As to the discrepancies between inside and outside numbers, well, I can _guess_ about what Mars looks like from some telescopic photos, but nothing beats going and looking. Understand I'm not shooting at those outside companies for the differences in said numbers, but one must be aware that different methods can produce different numbers, and that using statistical methods to arrive at metrics.... well, that's how the US Census works.... and if you live in the right neighborhood you're going to find an awful lot of _dark_-skinned 'Caucasians' ;>

    1. Re:webmetrics by Anonymous Coward · · Score: 1

      I just urinate on the corner of the website and 'mark' my territory in a way familiar to all canines and trolls.

      You mean they prefer post-its? I was wondering why they always had the shotgun ready when I returned....

  2. Revenue Not Hits by Dragthor · · Score: 1

    Revenue is more important than "hits". I just wish that some of these content sites can pay their employees and backup all the hype.

    --

    - kk
  3. Black boxes ? by MarcoAtWork · · Score: 1

    Short of having the measuring company installing a sealed black box that counts the request and bounces it right back to the website I don't see many other ways to do it if the server operator is malicious.

    Of course some sanity could be put in the data if the usage bandwidth from the site's upstream provider is taken into account (5 million hits and you transferred only 2 megs ? hmmmm) pity that obviously this data is usually not publicized or available to third parties for obvious reasons. (so you pay 5$/gig upstream and you charge me 10$/gig if I go over my quota eh ?)

    I am sure, though, that through the wonders of reverse engineering, IP spoofing etc. it could be possible to foil even black boxes, I mean, how much would it take to just take a machine in the office, connect it to the black box and send http requests crafted so that they appear to be from other IPs ? If the site owner has physical access to the site's hardware this would be really easy to pull off.

    It would be more complicated if one had just a colocated box or a virtual host, but even then with some 3l33t h4x0r skills one could make the black box's head spin in whatever direction one wants...

    The web, though, is very interesting in this respect, because unlike in the TV case (where you have to count the viewers at the viewers' location) it is theoretically feasible to do a precise, repeatable and cost effective analisys by monitoring only the point of origin, which means that the money that would go into finding an acceptable demographic group, providing them with set top boxes that analyze their habits etc. one could invest all of this in creating one single monitoring device installed at the site's location.

    On a related note, I have a digital cable set top box, and I am sure that behind my back the cable company is collecting my viewing habits (I mean, how hard could that be ? the digital set top box already connects to their network to download programming information and it has a unique ID) anybody knows more about this ? My cable company is pushing heavily the digital set tops even towards people interested only in basic cable, and I don't think they are doing it for charity...

    --
    -- the cake is a lie
  4. Re:Why bother? by Luguber123 · · Score: 1

    Strange thing that categorize it like that, don't you ever buy anything? Actually, 2 of 3 things I've bought off the net during the previous 2 months have been from shops that have been linked from Slashdot. Most of the people I know that are intrested in Slashdot have more money than those who are not.

  5. Re:even radio... by Hanzie · · Score: 2

    Radar detectors aren't reliable, because many troopers use only stopwatches and a known distance. The radar is off.

    The police radios, however, are always on.



    --
    ********* sig: If you don't like the law, get filthy stinking rich, and buy a better one.
  6. Re:Web server statistics are NOT for marketing! by mikpos · · Score: 2

    You could probably set up a Perl script in a few minutes to make up the numbers for you.

  7. It's precisely the same problem with democracy by Froid · · Score: 2

    In this country (I needn't specify which), we elect representatives. We don't directly vote on most issues, even though it would be technologically feasible for us to do so (especially with the advent of the internet). Why? Because we're not just looking for an accurate measure of what people want. We want what they ought to want, and we hope that representatives will better reflect that than their actual choices.

    Advertisers don't just want to know what the most visible piece of real estate is in the world so they can erect a billboard on it. They want to know what the next upcoming innovation is so they can be the first to ride the upsurging wave of popularity. It doesn't help that altavista is the most popular search engine in the world today if placing a big banner ad on google tomorrow will catch the as-yet unseen mobs.

    Take Netcraft and server operating systems. You don't just want to know what people are actually running. You want to know what they dare to tell you they're running. This is why it's ok for Netcraft to base its statistics on what servers tell each other they're running, rather than on some complicated fingerprint of their tcp/ip stacks.

    It comes down to this: Adam Smith had it wrong with his theory of the invisible hand of market forces. It's not just what the markets do that's interesting; for that tells you nothing more than what, imperically, they do. If you pretend otherwise, then you're behaving no differently from all the Linux bandwagoners or Microsoft bandwagoners who base their decisions only on the herd. Herd mentalities are antithetical to proper advertising, and advertisers are finally waking up to this fact.

    Cheers,
    Froid

  8. Richard Fromage by Darby · · Score: 1

    Now, correct me if I'm wrong, but isn't "Fromage" cheese in some language?
    Coupled with the common short version of "Richard" that is pretty funny.

    My personal favorite fake name which is on my Fake ID (I'm over 21, I have it just in case) is Justin Case ;-)
    ---CONFLICT!!---

  9. Re:Perhaps a less annoying alexa? by Ruzty · · Score: 1

    There are other companies doing exactly this.
    One example is Raydium
    I'm sure there are others as well.
    -Rusty

    --
    The Master (Angelo Rossitto) in Mad Max Beyond Thunderdome, "Not shit, energy!"
  10. measuring web traffic by Keegs · · Score: 1

    If anyone is interested, we've found a unique way of measuring the web by tapping into non-confidential data at the backbone (ISP) level. You can see the results at www.hitwise.com.au (7 day trial available) or www.hitwise.co.nz (currently free). ISPs get a revenue from the sale of this info to marketers, advertisers, etc. If there are any ISPs out there in Europe, Asia or the US who are interested in partnerships please contact me. brendan@sinewave.com.au

  11. Re:Fraud by Kaa · · Score: 1

    The punishment in this country for fraud is not public hanging.

    No? Really? [shakes his head in wonder] Those are strange times we live in...

    The punishment is that you have to give all your money to a lawyer.

    I was under the impression that having to give all your money to a lawyer was the punishment for needing (or thinking you need) a lawyer.

    Kaa

    --

    Kaa
    Kaa's Law: In any sufficiently large group of people most are idiots.
  12. Re:Why bother? by Mtgman · · Score: 2

    Silly po boy! Of course I didn't let them have my dl number. I put down some random number which matched mine for about four digits. Just in case they asked.

    steven

    --
    -- I have marked myself unwilling to moderate-- I don't have other accounts to artificially inflate the karma of
  13. Re:Dealing With It Now by DarkSparks · · Score: 1

    WEB BANNERS DONT WORK! Sorry to shout - but I really believe that. I've run banner ad programs for several companies...and at the same time website analysis software that I wrote myself...so I know the results are solid. I'd buy a block of 10K clickthru's, and at the same time I'd watch and count the clickthru's on my side. In *all* cases, I'd get to just over 50% of my purchased clickthru's when I'd be notified by the banner provider that I'd reached my limit...and did I want to buy more. Bullsh*t! They oversharged me by double...and I delat with many of the same ad companies that you've listed. Run like hell while you can...

  14. A use for DoubleClick? by weatherwax · · Score: 1
    Seems to me the most accurate count could be had from the advertising services a major site uses. DoubleClick could get an actual count, without requiring sampling, by counting referrals to their ads. Their obnoxious cookies would make an estimate of unique visitors quite good, too. So they could give the same statistics as the audited site, with some measure of third-party independence.

    'Course, DoubleClick can be fooled by having cookies disabled, a JunkBuster proxy, or whatever, but I'd imagine at this point only a tiny percentage of users are sufficiently clued to use JunkBuster or Cookie Pal. Certainly too few to make the count less accurate than sampling.

  15. Re:Why bother? by Coward,+Anonymous · · Score: 1

    An ad that's being viewed by 4 million people has significantly more value, and thus has a higher cost, than one that's only being viewed by 2 million people

    Unlike a television ad, in which an advertiser pays a large amount for one ad, banner ads are charged per impression. So whether a banner is served up 4 million times or 2 million times, the advertiser is charged the same per impression (well, assuming it's on the same site, and assuming no volume discount for the additional 2 million impressions, but you get my point).

  16. Fraud by Kaa · · Score: 3

    Well, falsifying server logs in order to get better rates for banner ads would probably count as fraud, which happens to be a criminal offense in the US. A couple of show trials followed by public hangings should solve this little problem.

    Besides, banner ads are typically served from a server NOT controlled by the company which own the page. So people like DoubleClick know for sure how many times their ad was ignor^H^H^H^H^Hseen.

    Kaa

    --

    Kaa
    Kaa's Law: In any sufficiently large group of people most are idiots.
    1. Re:Fraud by Darth · · Score: 1
      well, you could still fake the data for something like doubleclick too. you'd just have to write a script that simulates clicks on the link and you could "stuff" you total clickthrus.

      the trick would be in writing a script to do it that would be subtle enough to not be caught by someone analyzing the logs and yet get enough clickthrus to make it worth the profit.


      Darth -- Nil Mortifi, Sine Lucre

      --
      Darth --
      Nil Mortifi, Sine Lucre
    2. Re:Fraud by jafac · · Score: 2

      Are you kidding? The punishment in this country for fraud is not public hanging. The punishment is that you have to give all your money to a lawyer.

      --

      These are my friends, See how they glisten. See this one shine, how he smiles in the light.
    3. Re:Fraud by null-loop · · Score: 1

      Teehee. Sounds like something I did against Linkexchange when I realised they were "inserting" paid for adverts into the banner cycle. There's a short summary at my old site Please ignore all rantings on this site, I was young.

      Basically you it used Perl scripts on various servers to flood linkexchange with false banner views. Worked quite well for a while, till they realised.

      --
      "If you unscrew Bill Gates' navel will the bottom fall out of the software market?"
    4. Re:Fraud by Erasmus+Darwin · · Score: 1
      I think the trick would require controlling (or at least being able to spoof requests from) a large number of IP addresses. While I am not an expert at TCP/IP, I'm mostly sure there's enough handshaking overhead in establishing a TCP connection that I don't think you'd be able to spoof connections from an arbitrary IP address. However, someone with access to routers that are responsible for a large range of IP addresses, on the other hand...

      Anyone wanna take bets on how long before we see a random technician at a random backbone provider get in trouble for spoofing ad hits?

  17. Honesty by bnitsua · · Score: 2

    The idea of hiring a company to generate web statistics to test for commercial viability seems impractical.

    If a company truely wanted to, they could easily obtain numerous IPs to forge the logs ahead. And think about a script kiddie exploiting java, perl, or whatever-- that would certainly make a website's statistics look better. The list goes on of ways to increase a website's usage.

    I think the only way to get this done fairly is to post a raw log, and let the investors (or whoever the target is) decide for themselves. Apache logfiles are fairly straightforward, and require little to no effort on deciding what is an actual hit and what is not. Of course, this would require honesty on part of the company, which seems to be the real issue.

    1. Re:Honesty by Alioth · · Score: 1
      I know someone (via the newsgroups) who had one of these "pay for clickthrough" deals at his site. So like any good script kiddie, he wrote a VB app to do lots of clickthroughs to get paid.

      He got caught. He deserved to be caught. I thought it was damned hilarious!

      Basically, he had just enough knowledge to be dangerous, but not enough to make his hits look like real, unique hits. The best thing about it is the guy has a massive ego and it took a huge deflationary hit ;-)

  18. Perhaps a less annoying alexa? by C-ThrU · · Score: 1

    It's really too bad that the alexa software is so annoying and invasive. Although it would be more akin to the neilsen type deal. Maybe if someone made a less visible type of "Alexa", they could give voluntary users "coupons" or discounts for online businesses in exchange for running their software.. then sell their stats. Or it could be all open source style with stats posted freely and no compensation for using the software... blah

  19. Who Cares?? - Advertisers, investors by bmongar · · Score: 1

    If you are out trying to get advertisers to buy space on your site, or looking for someone to invest in your site for expansion they may very well care how many hits you have and who counted them.

    --
    As x approaches total apathy I couldn't care less.
  20. Faking stats by Fervent · · Score: 3
    I can't imagine many things easier to fake than a server log.

    Are you kidding? When I worked at my last internship the boss would take the server stats from WebTrends, plop it in a Word file (to look good for investors) and then sometimes "moderately improve" some of the stats before printing the document.

    Fact is, most investors don't get a verbatim server log with all the technical "mumbo-jumbo". They get a simplified version with only the information the CEO wants them to hear.

    --

    - I don't care if they globalize against free speech. All my best free thoughts are done in my head.

    1. Re:Faking stats by CMiYC · · Score: 2

      Fact is, most investors don't get a verbatim server log with all the technical "mumbo-jumbo". They get a simplified version with only the information the CEO wants them to hear.

      Good point...and its not like web logs are the only thing that gets treated like this.

      ---

    2. Re:Faking stats by SmartyPants · · Score: 1

      Working for a large web site, I know that: a) web logs get audited b) each line in the log gets a magic number added to it, so auditors can check to see if the line is faked, or if lines are missing/added to it

  21. Re:Why bother? by markt4 · · Score: 1

    In fact, by looking at your own logs, you can say, "Well, Yahoo sends 10,000 people a day to my site

    Yes, but you can only look at your own logs to see that Yahoo sends you 10,000 people a day after paying them substantial money for what they said should average 20,000 people a day, based on their (or Media Mextrix's) logs.

  22. Re:Questions on making your own stats by Xerithane · · Score: 1

    What we do is having each server here (here being where I work not nerdfarm which I lose money to operate*G*) completely ignore for the most part what's going on.
    We've hacked out a packet sniffer that runs on the network picking up good data (ie, where is the hit coming from) and then between that and our dns load balancing software we get a really good monitoring package to find out exactly how busy our servers are. And BTW, our servers serve approximately 2000-6000 hits a second. yes, a second. Yeah.. it is a lot. And it all runs apache.

    nerdfarm.org

    --
    Dacels Jewelers can't be trusted.
  23. Who do you trust? by davejenkins · · Score: 1
    The matter at hand here is not the mathematical accuracy of web logs vs. the statistical consistency and ability to predict behavior, such as those produced by Nielsen, Metrix, and the rest.

    It's a matter of trust

    Sponsors and advertisers simply need to rely on a consistent, reliable, and _reputable_ company to provide the numbers by which they purchase advertising. It is the same reason that makes large corporations with huge accounting departments hire Ernst & Young to run their 10-Ks before posting them to the SEC: reputation and consistency.

    Everyone, including the site managers, advertisers, and Metrix, know that the numbers can be faked anywhere along the chain. The most honest option, and least culpable in terms of liability, is to have a measuring company run its analysis for all, even if that analysis is statistical predictions. If a site manager fakes numbers, he's a little untrustworthy; If Metrix screws up, they're outta business.

  24. Re:Most Downloaded Woman by iCEBaLM · · Score: 2

    It was just an example of the size of 800 million downloads. Still I think it was quite inaccurate. You must also remember, most people on earth dont have and dont care to have internet access, and of those people who would have heard of this womans site? People in north america.

    -- iCEBaLM

  25. Re:three types by theholyboot · · Score: 2

    The problem is that when people invest $10+ million dollars in a web company, no only do they want numbers, but they want EVERYONE to know those numbers. I work for a website that get ~8 million hits/day and has many regulations to conform to. The accuracy of our logs is what keeps our company alive. I've seen Nielsen, Media Metrics, report numbers for us, and they're all off from what we get. That's to be expected from sampling. What *really* matters from a marketing perspective is how much granularity you can get from these numbers. If you're logs show 20% less than what Nielsen shows, but you can drill down and get demographic/session/referrer/etc. data, then you're in a much better position. Number of hits are useless nowadays, but being able to break up this number into geographic location, time of day, site path, avg. session length, etc. is what makes logs usefull.

  26. Re:Why bother? by Steveftoth · · Score: 1

    One reason that we are not exchanging anything REAL. Nothing that is tangible, is because we live in the INFORMATION age. If you want something tangible, go back to the bronze age or something. People don't sell things, we sell mindshare. This is what web ratings (and TV) are all about. Selling a piece of people's minds. Why do you think AllAdvantage even had an inkling of a chance, because they thought that they could buy people's minds at a cheaper rate then they could sell them...
    But then a lot of people just got around that, and just got to freeload.

  27. Routers, hosting providers, and stats by ShannonClark · · Score: 1

    A thought that I have had, though since I am busy working on another business opportunity at the moment so I am not pursueing it, is that there are some obvious places to gather very accurate data about website interest.

    These are the main routers for the hosting providers hosting the website in question. At some level these machines "know" not just how much traffic is requesting the given website, but how many different IP addresses are requesting, how many of the requests are short/quick Cache refreshes vs. long "real" sessions etc.

    Capturing this data would require some sort of sniffing, which could have a performance hit and does raise security implications but could be overcome. Additionally the main routers for most hosting companies are outside of the control of the client companies so this data could be seen as more trustworthy than the server logs from a machine to which the client company has root.

    Anyway, just my thoughts a fair amount of work would need to take place to turn this into a new profitable line of business for the hosting companies, but given the market need for accurate data I think it would be worth pursueing.

    Shannon

    --
    -- Join us in Chicago May 1-4th for MeshForum -- writer, historian, tech geek, entrepreneur, internet junky since '91 --
  28. Meaningful Web Stats by Enormous+Cow+Turd · · Score: 2
    It is obvious that the basic methodology behind the collecting of web stats is flawed. For one thing, most web sites count requests, when in fact it is individual sessions that sould be counted.

    Now if a company is interested in gathering web statistics in order to steer corporate decision making, then they should really look at collaborative filtering as a means to do this. No matter what else you have to say about Amazon.com, their implementation of the Net Perceptions collaborative filtering engine is incredibly accurate at analyzing and predicting their customers' needs/desires.

  29. Fun with fake names/addresses by bee · · Score: 1

    This reminds me of the fun I have with Radio Shack asking for my name/address/etc. When I lived in Indiana I would regularly tell them I'm "Richard Fromage, 1060 W. Addison, Chicago IL, 60613"-- if that address doesn't ring a bell, go rent the (original) Blues Brothers movie.

    Makes me wonder how much junk mail the Chicago Cubs get and routinely dispose of...

    ---

    --
    At least mafia-owned pizzarias make excellent pizza. Compare to Bill Gates.
  30. Re:Why bother? by Mtgman · · Score: 4

    If someone is willing to take the hosting site's word at face value with regard to eyeball real-estate, then I've got some banner ads (and a bridge) to sell them.

    And this is the really sad part. The information age has created a new type of cyber-criminal. The false information broker. Society is moving away from products and building multi-purpose machines. As a whole were're more service oriented than we used to be. This means all our assets and business transactions are on paper. Nothing tangible is being exchanged. And typically we have such a high volume of data being transferred that it can't be checked for 100% accuracy. I signed up for one of those "saver" cards at a local grocery store(part of a national chain) and totally faked the information on the signup sheet(I get enough spam as it is, thank you very much) No one caught it, even though an application with an address of 1600 Penn Ave in Ft. Worth, Utah with a completely made up Zip code and a Texas DL number showing up at a store in Tennessee _should_ have raised an eyebrow or two.

    So now we have the buyers and the sellers. A buyer can't always trust a seller and a seller can't always trust a buyer. Enter the middleman who keeps both parties honest. Am I the only one saddened by the necessity of a service like this?

    Steven

    --
    -- I have marked myself unwilling to moderate-- I don't have other accounts to artificially inflate the karma of
  31. Re:Lies, damned lies, and proxies by fraggedtroller · · Score: 1

    The only useful info that you may get from a server log is the nu,ber of UNIQUE visitors to your particular page, not page views or hits. This is why sampling may be useful! Sampling lets us see in general what people are going to see and allows a more accurate count of unique users.

  32. This is just the warm up... by haus · · Score: 1
    long term how much traffic a traditional web page gets, even if they are a powerhouse will not matter that much. Traditional webpages are somewhat dull and are not a very attractive target for advertising dollars.

    Where it starts to matter is measuring audience for streaming media. And neither server logs nor sampling will give an accurate vision of what is actually happening. This is where real audience management comes in, from the likes of companies such as Reliacast. The ability to get exact counts of the number of participants on a streaming event regardless if it is a unicast or multicast event.

    all persons, living and dead, are purely coincidental. - Kurt Vonnegut

  33. Hits vs Page Views by O'Really · · Score: 1

    I wonder about this every time I hear some no name website say that they get X,000,000 hits per month.

    I've seen some stats pages, and there are usually over ten times as many hits as there are page views.

    I'm assuming that the average visitor views more then one page. So when they say "We get X million hits a month" they're only getting roughly X hundred thousand visitors or so? Or are they using "hits" as a term for visitors?

    Do these people even know their own real stats?

    --


    __No, that is not my real address__
    1. Re:Hits vs Page Views by Tappah · · Score: 1

      Those that are pointing out that "sessions" are a more accurate measure overlook the fact that banner ads are typically delivered per page pull - and users who view 10 pages may see 10 completely different banner ads in that time.

      These numbers are important for both advertisers and web operators alike, because many banner ads pay on a cost per thousand (CPM) basis. Advertisers need to see how many people are actually seeing their ads, and operators need accurate numbers to be sure they are paid fairly.

      Any web operator you meet, can tell you that the page impressions counted by the banner ad company *never* match the page impressions reported by the logs, and often are on the order of 1/3 the number webtrends or webalyser reports. Add that to the idiotic way companies like Media Metrix are "projecting" traffic based on relatively small sample sets, and the result is, that website operators always manage to get screwed in the deal.

    2. Re:Hits vs Page Views by Darth · · Score: 1
      yeah hits are inflated because a lot of hits come from an individual. page views are supposed to take that into account. the problem is that page views isnt exactly defined anywhere. the statistical package i'm most familiar with (NetTracker) allows you to define what constitutes a page view.

      Personally, i think that's good because how many pages deep someone goes on a single view will change depending on the site and it's content so it should be a flexible definition, but it does give people who want to sound impressive the ability to tweak their stats however they like.

      i guess the real answer is that nonbiased 3rd parties should run the stats on the raw logs and determine how to calculate a view.

      internally, i would think the company would want those statistics as accurate as possible so they can make business decisions, but for advertising or venture capital hunting, they might inflate their numbers. i certainly would trust the numbers a company gave me, without verifying them myself, if they were trying to get me to invest millions in them.


      Darth -- Nil Mortifi, Sine Lucre

      --
      Darth --
      Nil Mortifi, Sine Lucre
  34. Re:even radio... by Hanzie · · Score: 2

    I would like to get one of those detectors. Police frequencies aren't hard to determine.

    Wonder what kind of range it has?

    --
    ********* sig: If you don't like the law, get filthy stinking rich, and buy a better one.
  35. Never give out your logs! by DarkSparks · · Score: 2

    If you're turning your logs over to someone else for analysis, you might as well post your savings acount number, PIN, and SSN to the Internet. The information contained in your logs is, IMHO, some of the most proprietary data an Internet company owns. DarkSparks

  36. Re:Why bother? by po_boy · · Score: 2
    ...and a Texas DL number...

    You let them have your DL number? Seems kind of pointless to lie about the rest of the stuff when your DL number is on there.

    I guess I am assuming that you didn't lie on your drivers license (about more than your height and weight)

  37. pardon me.. by Kabloona · · Score: 1

    but I'm really stoned right now, could you repeat that?

    peas, -Kabloona

  38. Re:More importantly, demographics by crisco · · Score: 3

    The server logs don't tell you who is coming to the site. Sure, you know that 201.189.67.109 (completely made up) stopped here and you can even do a reverse DNS on it, but the advertisers that pay for banner ads and the corporate marketing types want to know how much disposable income is behind that IP and what they might spend it on. That is why DoubleClick and all want to track you and even correlate you with a name and address, that info lets them classify you and sell your eyeballs to the advertisers. Have you seen the higher prices that they get for targetted ads? Nearly double their normal rate last time I looked.

    --

    Bleh!

  39. mod_log_spread to an auditing host? by vallee · · Score: 1

    You could configure George Schlossnagle's mod_log_spread to multicast apache log entries to a third party audit host. That would be realtime, very hard to fake, and transparent to your config.

    --
    The real Paul Vallee is slashdot userid 2192, and, what do you mean it's not cool to point out your low userid?
  40. Re:Questions on making your own stats by po_boy · · Score: 1
    I'm using Wusage...

    Wussgage? is that some kind of measurement of the tendancies to give up a in a fight, or complain or something? Exactly what kind of scale does one use to guage the amount of wuss in a person? Is this differnt than measuring the amount of wussy in a person?

    Or is this the thing I keep hearing in that Budweiser commercial: "Wusaaaaage?", "yeah, wusage.", "Wusssaaaaaggeee!".

    hmmm.

  41. URL fixed: mod_log_spread to an auditing host? by vallee · · Score: 2

    You could configure George Schlossnagle's mod_log_spread to multicast apache log entries to a third party audit host. That would be realtime, very hard to fake, and transparent to your config.

    --
    The real Paul Vallee is slashdot userid 2192, and, what do you mean it's not cool to point out your low userid?
  42. Overlooked? by The+Queen · · Score: 2

    Yes, you're right, but it seems nobody has mentioned what is painfully obvious to me from dealing with my own clients. "What's a server log? Do we have one of those? How much is that?" Most businesspeople (suits) need somebody to translate the tech for them, and whenever there's a translation, there's the opportunity for deceit.

    The Divine Creatrix in a Mortal Shell that stays Crunchy in Milk

    --

    The House Between - Original Sci-Fi Series
  43. Re:Carnivore by trolebus · · Score: 1

    Whoever ranked that funny is on the ball

  44. I know its off topic..but huh? by CMiYC · · Score: 2

    It is virtually impossible for a device to detect what radio station you are tuned to. There's no way for anyone to tell what station you're listening to, short of getting into your car and looking at your radio. If you still have an all analog radio, then maybe you could detect harmonics caused by the filters at the local oscillator, but I don't see that as being reliable from any distance. If you've got a recent stereo, then its probably DSP driven anyway. So how could you possibly tell then what station the person is tuned to? Telnet to the proc and do a ps aux | grep LO-RF?

    I'm sorry... I just don't buy it... a device that could detect what radio station you are listening to? Nope. Don't buy it.

    ---

    1. Re:I know its off topic..but huh? by smileyy · · Score: 2

      I would guess the device works by picking up the audio coming from your car, then comparing it to the output of known radio stations in the area.

      Just gotta mic each parking space.

      --
      pooptruck
    2. Re:I know its off topic..but huh? by sfbanutt · · Score: 1

      Actually, every super heterodyne radio receiver (which almost all are these days) generates a radio signal of its own. This is mixed with the incoming signal to generate a signal at 455 khz IIRC. The generated signal can be detected and used to determine which frequency your radio is tuned to.....

      --
      I've wrestled with reality for 35 years and I'm happy to say, I finally won out - Elwood P. Dowd
    3. Re:I know its off topic..but huh? by CMiYC · · Score: 2

      Good luck. That's why we have audio amplifiers in our radio. After the IF stages you're left with a filtered signal that is extremely weak. The purpose of the local oscillator (at 455kHz) is to turn it to the frequency you're listening into the audio range. Once you pass the local oscillator it has to be amplified. You aren't going to be able to detect the broadcast frequency being mixed with the LO... if you could, then you would have needed an audio amp to being with.

      ---

  45. Get the ISP to give up stats by lanner · · Score: 1


    Get bandwidth statistics from their ISPs if you can.

    The ISP that I work for generates stats on almost every interface on our network, save a few odd pieces of hardware that do not support it or are not worth supporting it. You cannot count the hits, but you can count the proverbial p0rn that they are pushing... or pulling.

  46. Most Downloaded Woman by iCEBaLM · · Score: 3

    I was thinking about this when I heard on Entertainment Tonight about Guinness crowning the "Most downloaded woman on the internet". And when I heard her astronomical number of 800 million downloads I thought it was incredibly inaccurate. Every man, woman and child in the US would have to download 4 of her pictures. How does Guinness come up with the final numbers? Do they even check the logs themselves? Are thumbnails viewed on a page included in the final numbers?

    When I eventually went to her site (I can't even remember her name for gods sakes) she had almost no pictures on it of herself, lots of other girls however, I tried in vain looking for some of her and I was thinking to myself that the numbers were severely inflated.

    While this might be an "obnoxious" question I think a standard way of evaluating just how many hits and downloads a site gets needs to be determined, expecially for awards like the Guinness Book.

    -- iCEBaLM

    1. Re:Most Downloaded Woman by Darth · · Score: 1
      i agree that the 800 million is probably calculated in whatever way is most favourable to her goal. (which is to say horribly inflated)

      of course, that's for the total amount of time she's had the site. i have no idea how long that is, but i seriously doubt it existed before '96.

      most people on earth dont have internet access or dont care about the internet, but that still leaves a lot of people who do. on a quick look, i couldnt find any numbers to supply. considering she's backed by playboy, i would expect her audience to be worldwide.

      (not to mention it's porn....it probably comes up in every search done on the net.)


      Darth -- Nil Mortifi, Sine Lucre

      --
      Darth --
      Nil Mortifi, Sine Lucre
    2. Re:Most Downloaded Woman by Darth · · Score: 1
      that would be Danni Ashe. i just read a story about the guinness thing at The Register.

      story link

      apparently she sent them her server logs. according to the story she got 121 million downloads of her image from her free site and 120 million downloads of her image from her pay site.

      by the way, that 800 million shouldnt just be broken up into how many US citizens would have had to download her image. there's a few billion people in the non-US citizen population who could have downloaded her image too.


      Darth -- Nil Mortifi, Sine Lucre

      --
      Darth --
      Nil Mortifi, Sine Lucre
    3. Re:Most Downloaded Woman by kindbud · · Score: 1

      Oh yes, you're so right. We can't have any fooling around with Guiness records, why, people might think they don't have to take it seriously...

      --
      Edith Keeler Must Die
  47. Re:even radio... by CMiYC · · Score: 1

    This is entirely different. Police re-broadcast... your car's radio doesn't.

    ---

  48. I'm surprised larger companies are so un-educated by Gazoomba · · Score: 1

    For an example on one of my web sites I actually keep track of many different statistics on a daily basis. I count each individual pages on my site by themselves and also all pages as a whole. For all of the above I have two counts, one is the # of times each individual pages are requested and the other is the # of times each individual pages are requested by a different IP. This allows me to do two things: count the # of raw page servings, and the # of unique visitors on a daily basis. Of course if i really wanted to go crazy i could make a decision that a unique visitor is each unique IPs on a per month basis or perhaps count a unique visitor as each hits on a per IP address basis as long as no visits have been made by that IP address in the last hour....personally i just keep a count of everything and ask sponsors which count they prefer :)

  49. Simple solution by Fat+Rat+Bastard · · Score: 1

    When the FBI has carnivore attached to every ISP and RIAA / MPAA / DC / AnyOtherNameForADMCALovin'Company / UCTIA backed spyware company I'm sure they'll have LOTS of reliable statistics about your website (and shopping patterns, and bank accounts, and how much porn you download off of newsgroups)... and you'll be able to get it... for a price of course (unless you do something the Gvm't doesn't like, then you'll get it free under the rules of discovery. Too bad you'll be in jail soon after) Nathan "They're watching me, I swear" Cento....

    --

    If you don't have anything nice to say, say it often.
    - Ed the Sock

    1. Re:Simple solution by Fat+Rat+Bastard · · Score: 1
      When the FBI has carnivore attached to every ISP and the RIAA / MPAA / DC / AnyOtherNameForADMCALovin'Company / UCTIA backed spyware companies are collecting all of that data on you I'm sure they'll have LOTS of reliable statistics about your website (and shopping patterns, and bank accounts, and how much porn you download off of newsgroups)... and you'll be able to get it... for a price of course (unless you do something the Gvm't doesn't like, then you'll get it free under the rules of discovery. Too bad you'll be in jail soon after)

      Nathan "They're watching me, I swear" Cento....

      (Yeah yeah yeah... I sould learn to use PREVIEW)

      --

      If you don't have anything nice to say, say it often.
      - Ed the Sock

  50. Re:Why bother? by wnissen · · Score: 1

    A television ad, like a magazine ad, is guaranteed a certain viewership. The Super Bowl has such expensive ads because it has a huge rating, and you get lots of unique impressions that are impossible to get otherwise. Four ads on a 15 share program like the Olympics aren't worth anywhere close to one ad on a 60 share program. That's because TV ad buyers would rather have 60 million people see their ad than 15 million people see it four times. So a banner ad served to 4 million unique people is worth more than a banner ad shown twice to 2 million people.

    Walt

  51. Re:Lies, damned lies, and proxies by technos · · Score: 2

    Please, send her over! I'll gladly give her triple what she recieved for her last album gratis, in the name of continuing art.

    [technos begins scrawling in the checkbook.. Pay to the order of: Courtney Love, Date: September 25, 2000, Amount: $3,000 and no cents]

    --
    .sig: Now legally binding!
  52. Re:three types by Aphelion · · Score: 1

    I strongly disagree with this.

    I work for a company that recently made its way into the Media Metrix top 20, and I know that we built our name by focusing on popular, yet niche, content. Some of it didn't "rock," but that's all subjective, and we invested in the numbers.

    Once we have the numbers, we can develop the investment and provide quality assurance in creative and informational aspects of the network. In fact, that happens to be what more than a few people do around here.

  53. Re:Why bother? by Sodium+Attack · · Score: 1
    The information age has created a new type of cyber-criminal. The false information broker.

    Nope, that sort of criminal has been around for quite awhile. The classic example is the "blue book" purporting to give average market values of used cars. In fact, the blue book is put out by the used car industry with higher-than-market prices, solely for the purpose of allowing used car dealers to advertise that their prices are "below blue book", and/or convincing consumers to agree to artificially high prices for used vehicles. (There are other used car price guides which are more accurate in their values.)

    --

    Never take moderation advice from sigs, including this one.

  54. Re:Lies, damned lies, and proxies by technos · · Score: 5

    NOTE: By reading this post, you have agreed to run around the room which you are currently in, flapping your arms, and sqawking like a chicken.

    Okay, I did it. Unfortunatly, I was reading your post at the same moment my boss was entering the cube, and I've been fired. Under the terms of the 'technos' AUP (As amended September 12, 2000), and UCITA, you are hearby notified that you owe me $28,941,285.42.

    Referencing clause two of the AUP, this number reflects the sum of my maximum earnings potential until retirement age, as well as the cost of obtaining said employment (six years of college at a major University), as well as an additional 34% transgressive penalty and a 9% compounded cost-of-living increase.

    You have ten business days to remit the sum, in whole, or I will be forced to submit a class B lien request against both your holdings and those of your employer in the State of Maryland.

    Clause six clearly states you indemnify me against any legal malfeasance or action, so don't even try to get cuetsy with a countersuit. It has a binding compensation clause of $2,000,000.

    --
    .sig: Now legally binding!
  55. Web server statistics are NOT for marketing! by komet · · Score: 5
    Where the fuck does the idea come from that your should show your web server stats to marketing/sales people? Because current programs are really just some measurements of technical data, useful for planning server loads and Internet uplinks, but not for demographic data. PHBs want something like this:
    • Yesterday, 1308 people visited your site. Of those:
    • 183 weren't paying attention at all anyway.
    • 22 were your competitors.
    • 318 were poor college students drooling over, rather than contemplating buying, your products.
    • 139 were actually looking for pornography and left your site immediately.
    • 38 were webdesigners stealing your HTML code.
    • 133 were here to compare your prices with the competitors. Of those, 29 decided to buy your product.
    • 84 were in your target demographic, but were so stoned at the time that they didn't read your sales pitch.
    • 12 people actually bought something online.
    • 18 people liked your product and went out and bought some offline.
    • Of those 30 people who bought something, 28 sent the URL to a total of 56 friends to show off what they had just bought. Of those friends, 3 subsequently bought something.
    Ok, so where's the software which can get that data out of your server logs?
    --
    Any technology which is distinguishable from magic is not sufficiently advanced.
    1. Re:Web server statistics are NOT for marketing! by under_score · · Score: 1

      Hmm. Actually, the scary part is that there are companies working on stuff like this. Using weird demographic and tracking stuff. I happen to have worked for one of them. It's not actually as hard as it might seem to get a lot of really good info just from web logs. And add a few other technological pieces and suddenly ZeroKnowledge looks like a really nice place to hide.

    2. Re:Web server statistics are NOT for marketing! by FooBarson · · Score: 1

      12 sales out of 1308? Not bloody likely. The accepted statistic is 1 sale out of 1000 visitors and that statistic doesnt even take into account the 30% chargeback margin due to rampant credit card fraud.

      Sorry, but it appears that you are spouting all these statistics based on your experience with porn sites.

      Having worked for a company providing online stores for legitimate businesses (Ie, non-porn), I can say that it is very normal for there to be a >1% conversion rate. In addition, credit card fraud was an issue that seldom arose.

      Matter of fact, the folks in charge of promoting the store always felt like %1 was a terrible number, and wanted to beat it.

      I'd bet you can make a lot of errors by applying porn-site statistics to more general areas. (This just in!: Studies show 100% of web sites attempt to disable the back button via JavaScript.)

    3. Re:Web server statistics are NOT for marketing! by leo.p · · Score: 1

      Actually, I got my statistics from several wired articles that were not refering to porn sites. Also, the chargeback ratio is for *information* sold. If your experience includes sites that actually ship something, your stats will be better. The 0.1% conversion ratio is an accepted industry figure. It is an average, of course.

  56. i use webalizer by rdnzl · · Score: 1

    i have a cron job set up to run webalzier on my acces_log every day. it gives me referrers.

    1. Re:i use webalizer by British · · Score: 2

      My admin has that too. In fact, we just(read: lazy) turned off any LOCAL referrers(which comes from people surfing around different pages in 1 session). It's amazing what you'll find sometimes linked to your page.

      For some bizarre reason, there were 15 counts of a referrer from osdn regarding the Slashdot cruiser. I checked the Slashdot cruiser web page only to find nothing linked to my site. Strange(but then again a page doesn't have to be linked. It could be 15 people were at that page first, and then went to something on my site)

      Other than that, I found the usual google returns, and plenty from articles I commented on from here.

      And when i hosted a real wacky e-zine(the boulder news frenzy) which had tons of vulgar language, every pervert with keyword searches of "toilet sex", "rape", etc went to the zines on my pages, only to be disappointed to find an ASCII rag.

      So take a look at those referrers. You'll be amazed what you find. Often you'll see someone on a webboard post a link to one of your pages with a positive/negative comment.

    2. Re:i use webalizer by morgus+morphus · · Score: 1

      I have seen similar things with hits showing up from just an ad banner; my theory on this is that it is a result of the new fashion for iframe banner ads (that is the banner is on a separate html page in an iframe).
      It seems like some browsers will just use the last html page visited on a page as the referrer.

  57. Re:Questions on making your own stats by gorilla · · Score: 2

    This is one reason why a genuine 'audience' is going to be lower than the raw logs. Local traffic and robots aren't real traffic. I could increase the raw hits on a site to almost any level, simply by throwing a few htdig processes at it. Wouldn't mean anything though.

  58. They're not interested in "Hits" by Anonymous Coward · · Score: 1

    Our investors, partners, advertizers, etc. aren't interested in hits -- at least not since maybe 1995 or 1996.

    We get asked for detailed reports on "impressions" (an old print advertizing concept) and "page views" and "unique visits" and "return visits", and "length of visit", and "pages per visit" ... and all kinds of other goodies.

    Who the hell get's paid for hits?

  59. Re:Why bother? by interiot · · Score: 2

    The point isn't to prove it to yourself, it's to prove it to the advertisers who might want to put an ad on your site. You dolt.
    --

  60. From the advertising point of view... by oliverk · · Score: 3

    I work for a major ad agency that produces the full spectrum of work, online banners and applications, broadcast and print spots, etc., so really from our perspective its about comparable measurability. We deal is a world where the media mix can contain any number of mediums, and right now the online space is the most difficult to measure and justify to our clients. This isn't so much about what

    I come from a good (read: more than five years :) ) background in the interactive territory, and I've gotten pretty used to the issues of measurability on the internet. The reality is that, for those of us creating work online, we've gotten overly accustomed to the nuances of online and forget too often to explain it all over again. There's also no major player that will admit that measurability across sites and users is nothing more than a statistical crap-shoot. I don't know why none of them will admit this -- certainly the polling that's done by Nielson and the like is nothing more than statistical projections, and really it's a lot better to have something imperfect rather than nothing at all.

    In reality, our clients still really don't understand why these numbers are so different and then question our recommendations based on what they read. It challenges our reputation and affects the trust the clients typically feel in our creative or media teams. Broadcast and print, as well as the other "offline" mediums, really then have one big advantage: those mediums have been in use long enough that our clients no longer ask the questions of "how can we justify those reach numbers" or "sure I see what you're saying, but my other consultant says that you're only reaching half that audience with that commercial."

    So, maybe the challenge really lies with each of these "measurement" firms not admitting that they could be wrong. Maybe its that the sites that are polled are financially incented to inflate their numbers to justify acquisition or second-round financing. Maybe its that the technology exists to perfectly track a user's path anywhere, anytime but one of the first "features" in the browser was anonymity. Maybe it's the convergence of all of these different pieces at the same time (which is most likely the case).

    Sad. The interactive space has such opportunity to get around lofty advertising and blink-tag style direct marketing. But unless we can justify the funds, apportioned largely based on reach to the market, we won't end up with the type of experience marketing that actually ads value to those of us online.

    --
    ---- Please be nice in case my Slashdot karma ~= my real life karma.
  61. If you're worried about "inflated" stats... by ShaunC · · Score: 2
    If you want to buy banner ads on a particular site and you're worried that their demographic info is inflated, here are some things to try.
    • Have them pull the banner from your server, not theirs - never ever let them put your ad banner on their server. Do a test ad run with them, then analyze your own server logs. You'll be able to see if your banner was really pulled, say, 10K times or if they quit showing it after far fewer impressions. I've caught several places shorting me. You can expect some discrepancies due to caching and other issues, but if you're supposed to get 10K impressions and the image only gets served 2K times, consider it a lesson learned and advertise somewhere else.

    • If you want proof of their traffic claims, ask them to embed a 1x1 GIF from your server (or one of those little FastCounters set to 1x1 size) on their page. Check your own logs, or view the FastCounter in full size, to see if they're really getting the traffic they say they are. Most one-man websites will be happy to do this when faced with the chance to gain you as an advertising customer; but don't expect Excite et al to bend over for you like this.

    • Whenever possible, purchase ads by click-through, not CPM. Click-throughs will cost you more, but I'd rather get 1K guaranteed clicks than 10K ignored impressions.
    Shaun
    --
    Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
  62. Re:Dealing With It Now by TheSync · · Score: 1

    We've been with several banner ad networks, and well, if you're getting more than $1.50 CPM (cost per thousand impressions), you're incredibly lucky.

    I was recently speaking with a company who advertised on our site through a network. They were spending $25 CPM for the ads, and we saw about $1 CPM by the time it made it our way (due to advertising agency and network costs, which seem to be much larger than stated).

    My suggestion to content sites: learn how to sell your own ads. Even if you sell just a handful of your impressions, you'll probably make more than any network could bring you. Keep the sales in house.

  63. Prevent log file forgery and gain creditbility... by xaniamud · · Score: 2
    Some Solutions:

    • make web application code open and available for audit in order to prevent invalid/illegal logging.
    • cryptographically sign the logs at periodic intervals and/or when the applications are stopped and started. This will help prevent tampering. Even encrypting the logs so that only particular individuals can access them might be suitable.
    • use W3C standard log file formats.
    • hire a reputable, independent auditor to validate your metrics at regular intervals.
    What's all the fuss about?

    Rob.

  64. A little late, a little old... by SnakeStu · · Score: 1

    See what happens when your morning is full of meetings? Sheesh...

    Anyway, although now it's looking old and stale, I still consider the following paper of mine, which was published a few years ago, to be relevant to this topic (IOW, things haven't changed enough since then to make it irrelevant):

    Examining the Validity of World-Wide Web Usage Statistics

    Enjoy...


  65. Re:Dealing With It Now by happystink · · Score: 2
    ContentZone calculates their page views slightly differently thanm other advertisers. It's a bit tough to explain, but if you don't set up enough unique page codes or whatever they call them, you won't make as much money, it's true (and they say so I think on their page). Engage on the other hand, may credit you for more ads, but DO NOT USE THEM. flycast used to make me 30 bucks a day, then Engage came in and I now make 10 bucks a day on TWICE the traffic. They can't sell even 25% of my ad inventory and their pay is pitifiul on what they do pay. Incidentally, i get about the same amount of hits as you, so really, engage will probably equal 10 bucks a day for you. They blow hard.

    sig:

    --

    sig:
    See the "..for smart people" banners Wired runs here? Look elsewhere guys.

  66. Server logs and their usefulness. by Restil · · Score: 2

    Server logs can tell you a variety of things, but I don't necessarily think they're useful for marketing purposes except for the owner of the server, and not so much for advertising. I run a small site that gets about 100 unique visitors a day and about 25 regulars. Using the logs and parseing out the data, I can determine that almost all of the people who visit my site stick around for a little while, but don't come back later. At least, thats what the logs say. I can also see the referring site, which tells me where any advertising should be focused on, as well as if someone actually clicked on a link or entered the URL straight (or from bookmarks) which would indicate if a user has visited before. Of course, any user using a dialup connection will probably have a different host/ip the next time they visit, and the logs will still show them as a separate user. AOL's proxy is especially bad as the host will change EVERY TIME the user makes another hit on the site, which makes it very difficult to track. Cookies and user accounts would be much more useful to determine exactly how many visitors you have and how many of them visit frequently. However, I still believe that this information really is only useful to the server operator and not to someone looking to advertise on the site. Marketing as it stands should probably be a trial and error operation. Spend some money and see what happens. When I ran a business several years ago I tried advertising in a variety of different places. Ads for computer sales got practically no response from a computer magazine but got a LOT of response from a simple 4 line classified ad in the newspaper. Sometimes you just have to throw some money around and see what you get back. Yes, there is some risk, and yes, you will probably lose some money before finding a medium that works well for you, but thats the name of the game. -Restil

    --
    Play with my webcams and lights here
  67. and she is sort of old+ugly by Alejo · · Score: 1

    Maybe they did it on purpose to get publicity afterwards taking her "crown" off...

  68. third party logging? by shutdown+-h+now · · Score: 2

    What if a company who is a third-party, independent of either the advertiser or the web hoster were to set up a box through which all internet traffic to the server was transparently passed to. The third party logs the traffic to determine if his logs match what the hoster is claiming. The advertiser can trust the third party because he hires one he trusts to provide this service for him.

    The hosters guys can't access the box because it is literally black boxed (locked up, no physical access, and no knowledge of the logins/passwords)

    The third party logger can remotely access his box, download logs or whatever and provide that info to the advertiser. The advertiser can then check the logs of the hoster and compare them to the thirdy party (aka verifier). If the verifiers logs match the hosters you know the data is somewhat accurate (at least as accurate as these things can be).

    I mean, nielsen does this with those boxes they give to their test families, why can't some enterprising third-party verification company (hmmmmm?) do the same with web-hosts.

    This looks like a nice little niche market for exploitation and mucho money to be made off of. I mean you write a few scripts to keep control over your logs and to send the logs back to a central server that formats this stuff into nice pretty print outs for the suits to drool over at their next board meeting.

    Just a thought...

  69. Re:I read the article by pug23 · · Score: 1

    The reason that you can't see your posts is that you are posting anonymously and therefore your posts start with a score of zero (0) and you most likely are browsing with a threshold of 1. For more info, try the link for the FAQ.

  70. Web site usage has to be done client side by Masem · · Score: 2
    There's no way to monitor the traffic effectively for a web site from just server logs. As others stated, problems with proxies, robots, and whatnot make it impossible to tell if you have a live human at the other side or if it's just 1000 AOL user in a cache or if it's just google for it's monthly visit.

    The only effective measurement of web traffic is by having volunteers that use a special proxy that reports what sites that the user visits back to a server, and to generate it from there. Exactly how the Neilsen boxes do it for television, which unfortunately means the same problems will crop up (Neilsen families tend to be favored around east/west coasts, thus making shows that appeal to midwest or plains state viewers less popular by appearence). Additionally getting volunteers might be a problem, as you'll most likely create a biased set by whom you select. And probably most importantly, privacy issues are more apparent for net ratings.

    --
    "Pinky, you've left the lens cap of your mind on again." - P&TB
    "I can see my house from here!" - ST:
  71. Problems with measuring traffic. by jd · · Score: 3
    Measuring web traffic accurately is a complex science, and not for the faint-of-heart. Why? Let's start with:

    • European users, especially, use web caches, rather than direct-through connections. So there isn't a 1:1 correspondance between server accesses and users.
    • Connection freezes, time-outs, etc, will end up showing more connections than actual users. (The user has to reconnect, which is a fresh "access".)
    • Framing, deep-linking, etc, will "smudge" the access count between any number of arbritary sites in an unpredictable manner.
    • Browser caches don't refresh on every access.
    • Dynamic IP allocation means that there is an n:n correspondance between addresses and users.
    • Network Flooding != Popular Site
    • Search Engines != Users
    • You don't control how the content is used. For all you know, Joe Bloggs, down the road, has linked his web browser to Internet Conference (a whiteboard from VocalTec that lets you send stuff via OLE to other machines)

    In the end, the only way to guague how many people have read your site is to place unique or unusual information on it, and then find out who knows it.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  72. Re:Lies, damned lies, and proxies by forgey · · Score: 1

    What your logs are even more useful for is determining what pages and services people are really using on your site. Perhaps there is a service that is costing you money to offer that not one person has used yet.

    That way you can determine what parts of your site to focus on and perhaps what services to start advertising ;)

    forge

  73. Dealing With It Now by waldoj · · Score: 3

    I've got a problem with that right now. A site that I operate, nancies.org, serves up about 600,000 pageviews each month. But we're regularly credited by 24/7 Media (aka ContentZone) for just over 400,000. But they don't give two shakes for our logs, and say that we just have to trust them. That's like the U.S. government saying, regarding carnivore, "trust us."

    BS. So I applied to Engage (formerly Flycast) last night to get our ads through them. Are they any better? I have no idea. But I do know that ContentZone is screwing us over, and that's incentive enough for me.

    -Waldo

    1. Re:Dealing With It Now by cymen · · Score: 1

      and then you found the "about us" page perhaps? took me a while too, hehe..

    2. Re:Dealing With It Now by waldoj · · Score: 1

      :) Sorry about that -- generally, by the time that people get to the site, they know just why they're there. Still, in case people stumble across it, I guess it wouldn't kill us to write "Dave Matthews Band" on there somewhere. Consider the suggestion taken.

      -Waldo

    3. Re:Dealing With It Now by AstroJetson · · Score: 1

      --Warning for the OT sensitive--
      If you are sensitive to Off Topic posts, do not read any further. Hit the Back button on your browser now.

      Coooool, I love nancies. The Last Stop for DMB news. Keep it up. Now that the tour is over, though, I probably won't be contributing as much to the 600,000 hits as I have been. :p

      --
      Admit nothing, deny everything and make counter-accusations.
  74. Re:Lies, damned lies, and proxies by Webmonger · · Score: 2

    I disagree. When 300 UNIQUE visitors view my page using the same proxy, they look like one visitor. The only thing web logs can tell you is how many requests your websever received. I call those "hits". Your definition sounds different.

  75. Re:Statistics by Tairan · · Score: 1
    Either that, or "Lies, damn lies, and benchmarks." Both of them have their times to be used. Are they that different?

    --
    /. is a commercial entity. goto slashdot.com
  76. three types by jafac · · Score: 3

    Gee, I guess somebody finally figured out what the third kind of lie is!

    I think that if you're investing in a web company, you should IGNORE the statistics. Go to the site. If it's lame, don't give them your money. If it rocks, go for it? How hard could that be?

    --

    These are my friends, See how they glisten. See this one shine, how he smiles in the light.
  77. Re:Mach Five versus The Slashdot Cruiser by Tairan · · Score: 1
    "Mach Five has a trunk where a boy and a chimpanzee can stow away Slashdot Cruiser has a drunk geek-wannabe writer in the backseat "

    You mean JonKatz comes with the Slashdot Cruiser? Wow! I should go register now!

    --
    /. is a commercial entity. goto slashdot.com
  78. imagination 101 by scotch · · Score: 1
    "... but I can't imagine many things easier to fake than a server log. Anyone have a good idea about how to approach this?"

    An Orgasm?

    HTH

    --
    XML causes global warming.
  79. Why bother? by Mike+Schiraldi · · Score: 4

    But when you advertise on the web, you can look at your web logs to gauge the audience - you don't need to trust their logs, or Media Metrix', or anyone else's.

    In fact, by looking at your own logs, you can say, "Well, Yahoo sends 10,000 people a day to my site, but only 10 of those people buy anything.. Meanwhile, Slashdot sends 1,000 people, but 500 of them end up buying stuff."

    So why are such ratings needed?
    --

    1. Re:Why bother? by Coward,+Anonymous · · Score: 2

      you can only look at your own logs to see that Yahoo sends you 10,000 people a day after paying them substantial money for what they said should average 20,000 people a day

      In the example you provided of 10K clicks when 20K were expected, this can be chalked up to a crappy banners by your graphic artists. However, there are brokers out there who buy ad space from many websites and sell it at a reduced rate to companies, some of these brokers buy space from the companies that pay their users to click on banners, in which case you'll get a high clickthrough but nobody who clicks through is interested in your website and will click the back button immediately. If you are contacted by a broker, ask them what websites your banner will be on, if they do not mention any of the websites where people are compensated for banner clicks, and it turns out that the majority of the banners are going to these companies, contact the broker and tell them to pull your ad then contact your credit card company and dispute the charge. This has happened to my company several times, the brokers have never attempted to get their money after being told the charge was disputed.

    2. Re:Why bother? by Erasmus+Darwin · · Score: 2
      I hate to break it to you, but you seem to be harkening back to an idealistic time that never was. Decitful business practices are nothing new. If they were, we wouldn't already have such things as:
      • Laws against fraud
      • Underwriters Laboratory
      • the Better Business Bureau
      • Truth-in-advertising laws
      • Consumer Reports
      • Ralph Nader

      Sure the Internet is providing a new avenue for many past practices, and the information-centric focus does create greater opportunity for "fudging", but this isn't anything that hasn't happened before.

    3. Re:Why bother? by ongdesign · · Score: 1

      Also, not everything's being sold on a CPM basis, or even on clickthroughs. A lot of sites are selling sponsorship of a whole section of the site, or simply a graphical "premium" placement in a directory. In these cases, total impressions are required to gauge CPM and conversion rates (for comparison to CPM-priced alternatives).

    4. Re:Why bother? by Erasmus+Darwin · · Score: 3
      So why are such ratings needed?

      They're needed because they have to have numbers to show to their advertisers. An ad that's being viewed by 4 million people has significantly more value, and thus has a higher cost, than one that's only being viewed by 2 million people. If someone is willing to take the hosting site's word at face value with regard to eyeball real-estate, then I've got some banner ads (and a bridge) to sell them.

  80. Carnivore by Anonymous Coward · · Score: 4
    Doh!

    Carnivore is the answer. Let the feds provide accurate and unbiased information!

    1. Re:Carnivore by Anonymous Coward · · Score: 1
      Now there are three words on that last line that should not appear in the same sentence...

      Feds, accurate and unbiased.

  81. webmetrics by DarkbladePDX · · Score: 1

    Well, whenever _I_ visit a site, I leave a post-it thumbtacked to the upper right corner of the page just so they'll know it was me.... *G*

  82. We are working on it :) by TheOpus · · Score: 1

    Well I just thought I'd mention that we are working on exactly that at Counted.com (or counted.de)

    One of our servers can count 10 million impressions per day so some big sites could use it too. It is multiserver capable so the sky is the limit really.

    It has also proved to be some of the most reliable when looking at other services.

    The nice thing is that you can choose a guest password to show the stats you want your investor/advertiser/partner to see.

    Counted! is free at the moment but you will need to display a button. We have category specific buttons too which is working nicely on B eOS related sites where we have the biggest of them in the site list. They are all ranked.

    We have a linux category too, so if anyone wants to do a cool Linux button I could put it in there :) Slashdot can of course use it too, but would probably win with easy :)

    The servers are running FreeBSD thought, the counter servers that is. Sorry guys ;)

    We are also working on something else that will be even better for what this is all about. Proove you hits to different people. Different people looking at which site is the biggest. Very cool stuff.

    Check it out at http://counted.com and give me your feedback at oliver@stats.net if you want. Now off to reading more of the comments :) TheOpus

  83. 'bots can be a quarter of all traffic! by danny · · Score: 1
    With my book review collection, around 20% of all traffic is from search engine spiders or other automated fetches. (For the month to date, analog reports 105 000 page accesses, excluding 28 000 "unwanted logfile entries" which are mostly excluded because the Agent string matches a spider.)

    This may explain part of the discrepancy between monitoring measures and logfile analysis - did the Brittanica people exclude automated fetches from their stats?

    Danny.

    --
    I have written over 900 book reviews
  84. Re:Lies, damned lies, and proxies by driptray · · Score: 1

    In short, spiders, proxies and caches make it impossible to be accurate in measuring traffic. But everyone else is affected the same way. So your relative stats are relevent-- they just aren't hit-for-hit accurate.

    Actually everybody is not affected the same way. If your site specifically targets, say, AOL users, your traffic will be relatively lower due to AOL's aggressive caching. Similarly, different countries have different caching practices, and so audiences that come disproportionately from a particular country may be under (or over) counted.

    If you think hard you can probably come up with a few more instances where caching practices vary for particular demographic categories, thus rendering web stats unreliable even for comparison purposes with other sites/pages.

  85. Careful what you measure by Tommi+Morre · · Score: 2
    Sampling is for media where it wasn't feasible to produce accurate counts (I say "wasn't" because media like TV can do much better than methods like the Nielsens these days -- they've got a severe case of "but-that's-the-way-we've-always-done-it"-itus). Now, measuring website traffic can be done with some accuracy, as long as you're careful what you're measuring -- some sites still count each page hit as a separate visitor! And provisions should be made for filtering out (to the extent possible) results from "visits" from the usual 'bots and trolls.

    As for the ease of faking server logs, not a problem (inserting standard I-am-not-a-lawyer disclaimer here): if you're using them as proof of traffic to your advertisers, write that into the contract -- then faking the server log becomes fraud, with the appropriate legal remedy's available. This is not my favorite solution (especially not with anything to do with the Internet), but displaying advertisements for money is a business relationship, and can be managed as such.

  86. Can't really fake logs. by Anonymous Coward · · Score: 1

    At my last job our numbers were audited. What we had to do was install a plugin to our Netscape and Apache servers. The logs would right a little MD5 checksum into each line that is logged. They would then right a MD5 on the whole log file. You then transfered the file to the auditors that then crunched the numbers. They figured out all the proxying questionable entries and came back with audited numbers.

  87. When the 'Authorities' Vary Wildly by Hnice · · Score: 1

    OK, I'm now doing this for a living, and I can honestly say that it's an effing nightmare wading through the various 'unbiased' sources and trying to figure out what their traffic numbers have to do with mine.

    As an illustration of the issue, last month, my co's site showed 4 million unique visitors on @Plan, and somewhere around 60 % of that number according to Media Metrix. I look at Doubleclick, at @Plan, and we've got an ETL and analysis process in place that we've built in place, and none of them box very well.

    I've had good results matching our internal #s with those we get back from IPro (IPro basically performs an ETL and canned reporting process on your weblogs), which is encouraging.

    At the end of the day, however, the message that I get from management is that no one outside the co. cares how accurate my numbers are or anyone else's aren't -- Media Metrix is all anyone trusts, and when it comes to selling ads or getting investments, I can go through the weblogs by hand and verify my numbers 50 times, and it simply won't matter. Which sucks, but it's the way it goes.

    Basically, the question I'm suggesting you ask yourself, and with which I've had to come to terms over the past couple months, is whether you're coming up with numbers for the purpose af actual usage analysis, or whether you're doing PR for your company. Now, both are important and have their uses, but you can't really make an informed decision about how you should be measuring your traffic until you figure out who it is that you're trying to impress.

    --

    god is just pretend.

  88. Media Metrix's Methods by Consul · · Score: 1
    Media Metrix's primary method of getting web stats involves volunteer web surfers to install software on their machines that monitors their web activity.

    Isn't it a well-known fact that people behave differently when they know they're being monitored? I mean, there are a few sites I normally go to that I probably wouldn't if I knew it was getting reported back to someone...

    Not only that, but what happens if someone gets this list of who has the monitoring software right now? "Hey, Mister Jones, we'll give you a twelve-pack of beer if you go surf around our site for a little while..."

    The mind boggles.

    --

    -----

    "You spilled my egg... I needed that egg."

  89. Re:Lies, damned lies, and proxies by msouth · · Score: 1
    Okay, I did it. Unfortunatly, I was reading your post at the same moment my boss was entering the cube, and I've been fired. Under the terms of the 'technos' AUP (As amended September 12, 2000), and UCITA, you are hearby notified that you owe me $28,941,285.42.

    Yeah, you think you'll be sitting pretty...until Courtney Love shows up for her share! Then all the geeks will laugh at you, and you'll be forced to be amused by that until media attention wanes and you ignore her anyway.
    --

    --
    Liberty uber alles.
  90. Alexa: How trustable is it? by Felipe+Hoffa · · Score: 1

    Just wondering how trustable is Alexa.

    Or what kind of people are using their software? Anyone at slashdot? Maybe if we all started using it this site would become number one in their records.

    Well, maybe we won't be able to run it at all, if they only have a windows version.

    Fh

  91. Assuming you were honest.... by dtolton · · Score: 1

    Starting with the above assumption, and you intended to screen out all robots, and internal traffic, I still don't know how you could come up with a really solid figure. Because you can't simply count each hit as there are typically repeat users. So you could try and look only for unique IP addresses. However this has its own problem, there are a lot of companies that use proxy servers when they go out to the web. So you could have 1000 hits per month from some organization, but because of their use of a proxy server if you counted only unique IP addresses you'd only end up with a couple (depending on how many IP's they use for their proxy server). So it seems like on the one hand your inflating your numbers, but on the other hand your slitting your own throat by under reporting your numbers.

    It would be intersting to see some type of mathematical equation that could in someway account for the use of proxies.

    --

    Doug Tolton

    "The destruction of a value which is, will not bring value to that which isn't." -John Galt
  92. Nope by geekoid · · Score: 1

    Sorry, there is no trusted way to report web site traffic.
    I guess all those big companies are going to have stop using the internet.
    Don't let the door hit you on the ass on your way out.
    Welcome to the internet, where you get to pay so we can advertise to you.
    Don't even get me started with cabevil.

    --
    The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
  93. good idea by twitter · · Score: 1

    This will give them a reason to tax the internet as well as slow it down and eliminate privacy. Three points!

    --

    Friends don't help friends install M$ junk.

  94. web bugs! by Lord+Omlette · · Score: 1

    Miniature 1x1 gifs inserted on a page. If the people doing the logging are polite enough only to use cookies to see how many times one person reloads the page (unique visitors) then web bugs will work fine. Of course, if it turns into marketing information, it's not fine. But the idea behind most secure stuff is a trusted third party, right? This way you can validate against server logs.
    --
    Peace,
    Lord Omlette
    ICQ# 77863057

    --
    [o]_O
  95. Re:Serious question about this... by jafac · · Score: 2

    the web is a popularity contest because in the "new economy", it's all about marketshare. That's it. Nothing else matters. Revenue doesn't matter. Profitability doesn't matter. A business plan doesn't matter.

    The premise behind this "marketshare is everything" is that, since the internet is a "new thing", the guy who takes over the most marketshare first, is going to be the dominant player - people think this way because they saw what happened when Microsoft entered a new market, and got the most marketshare. They dominate. They're damn near owning the whole freakin world. If they had played it more laid back, and done more honest hard work up front, they probably would have avoided this whole DOJ mess, and ten years from now, *would* 0wn the whole world. But no, the execs got lazy and greedy, and when it became apparent early on that Microsoft was only interested in putting out "good enough" products and killing off competition (instead of allowing competition to exist, albiet in a weakend state), the threat was so obvious, they had to be stopped. Act like a bunch of gangsters, get treated like gangsters.

    Anyway, the investment and business community is expecting SOMEONE to take over, and they want a piece of the action, of course, so that's why people are willing to risk a few investment bucks on who they perceive will be the Genghis Khan of the Internet.

    That's the "new economy" in a nutshell. And frankly, AOL/TW is "it".

    --

    These are my friends, See how they glisten. See this one shine, how he smiles in the light.
  96. Re:Exact Counts? by NaughtyEddie · · Score: 1
    You're right - exact counts are not available at all.

    Myself, I'd just take the highest counts to the investors ;)

    But it's getting so that most Slashdot stories contain elementary errors of this type. It used to be news when you'd spot an error in the main section text; now it's news when you don't.

    I might quit Slashdot. Do you think I could sell my 45 karma points on eBay? ;)

    --
    It's a .88 magnum -- it goes through schools.

    --

    --
    It's a .88 magnum -- it goes through schools.
    -- Danny Vermin
  97. Sampling? by djweis · · Score: 1

    I don't imagine that sampling is very accurate. I would be more comfortable with using cookie based session counting (not tracking/monitor activity). If it wasn't for the unethical tendancies of monitoring companies, a web bug would be a good way to keep track.

  98. Exact Counts? by Mr+Windows · · Score: 2
    It seems counterproductive to purposely use an innacurate statistical measure when exact counts are readily available
    How exact are these `exact counts'? ISTM that all sorts of things (caches, reloads, etc) can cause hits to be mis-counted.
  99. Questions on making your own stats by 198348726583297634 · · Score: 4
    I've been put in charge of producing the stats for my company's websites. I'm using Wusage, which is plenty configurable, scriptable, very well-priced for its functionality, etc., and I've set up a number of exclusion-filters.

    What I'm blocking out so far is:

    our company's internal IP traffic

    images

    funky robots like Keynote-Perspective that the old webmaster had let loose on our sites.

    This gives us some numbers I have confidence in (even though they're 10x less than the numbers the old guy was producing through Webtrends), but I'd like to find out what others are doing for making their own web stats.

    Thanks,
    Steve

    1. Re:Questions on making your own stats by guran · · Score: 2
      I recently had to analyse some companys web statistics as well. Some nightmare! They had changed the structure on their web, changed hosting company and lost some logs. Not to mention how noone even was sure when those changes had taken place...

      Fortunately, I was only intrested in changes over time, so I concentrated on inventing a measurement that gave a fair comparison.

      My work was done the old fashion way... Look at some server log filter out the obvious (gifs, anything with a sessionID in it, internal or developer hits etc) throw spss (statistical analysis tool) at it and start scratching your head...

      I started my presentation by telling everyone to please ignore the absolute figures and focus on trends and variations.

      What really bothered me was the thought of how good an site analysis tool I could have hacked together in those hours I spent decrypting archived data. The intresting part was to see how some people really care about being anonymous on the web. Makes an slashdot addict glad to see stuff like referer="none of your business" and cookie: Note="like most people I prefer my browsing habits to be anonymous"

      --

      All opinions are my own - until criticized

    2. Re:Questions on making your own stats by Koos · · Score: 1

      but I'd like to find out what others are doing for making their own web stats.

      I use Webalizer. It's GPL, it works, it's fast and very configurable. It logs search-strings (very configurable) so you know what people were looking for. It can detect a 'visit' reasonably (with a configurable timeout).

  100. Serious question about this... by v4mpyr · · Score: 2

    Why must the web be a popularity contest? At most the website itself should only be conserned about how many people visit they're website so they can keep their servers up to speed. They can get this form their own logs.

    Seriously, who really cares if NewsTrolls is visited more than Slashdot (just an example). The important thing is that they're getting visitors and the owners are enjoying their job.

    --

  101. Lies, damned lies, and proxies by Webmonger · · Score: 4

    Excuse me while I go "Grumpy old man". This is an old, old problem. It goes back to the days when I first started using the web. See "Why web statistics are (worse than) meaningless." It's an old article. That's the point.

    In short, spiders, proxies and caches make it impossible to be accurate in measuring traffic. But everyone else is affected the same way. So your relative stats are relevent-- they just aren't hit-for-hit accurate.

    What your server logs are really for is resource planning. They'll help you find out how much traffic your server is serving, which should help you plan bandwidth and hardware upgrades as needed.

    1. Re:Lies, damned lies, and proxies by Caspuh · · Score: 1

      Cookies and User Agents make 300 unique visitors unique.

    2. Re:Lies, damned lies, and proxies by Rob+Kaper · · Score: 2
      When 300 UNIQUE visitors view my page using the same proxy, they look like one visitor.

      I still hope to get some time to work on BBStats again, my webstats package - in terrible shape as it is - but I was hoping to solve this by using optional session cookies or the ability to import other cookies from the site.

      For example, /. could log my IP to see whether I am unique, but they could also fetch my cookie (and still fall back on IP if it doesn't exist).

  102. Media Metrix experience... by Anonymous Coward · · Score: 1

    My previous co had >300K unique monthly visitors a month. I got this number from making use of one of the third party 1 pixel image tracking tools. These do a reasonable job of defeating caching and use cookies to track individual users.

    Media Metrix didn't even report our traffic (which I think meant that they said it was 100K)

    PCData reported us in the 200K range.

    When I compared an audit of page impressions tracked by Hitbox (1 pixel image) and by Doubleclick, I got them to correlate to within 1-3% of each other.

    While MM may have a point with their "home/work" user counting, I can not get myself to believe in their methodology.

    I think it is valuable to have companies like MM, PCData, etc tracking this data, but the markets pay way too much attention to them.

    All that said, log based analysis gets a bit tricky. Depending on which software package you use and how you configure the software, you can get significantly different numbers of user sessions from the same log files.

    Cookies aren't perfect, but I'm more comfortable with them than the alternatives.

  103. one thing i can count on... by cetan · · Score: 2


    the one thing I can count on is that my site doesn't (and won't) get any hits :)

    --
    In Soviet Russia...michael would be rotting in Siberia!
  104. Re:Why bother to cloth the Emporer by MousePotato · · Score: 1
    To quote a famous American criminal (hero?):
    "Of course crime pays. If it didn't, there wouldn't be any criminals." - G.Gordon Liddy
    Sadly, the persistence of the term 'caveat emptor' points out that bad business and bad business ethics have been around as long as... well... prostitution. The real problem here is not that you need third party auditing to present the data but the fact that you can't (blindly) trust the third parties either. A few months back we discussed the fiasco of a third party service that signed your apps blindly just because you had paid for the certificate without verifiying the code wasn't malicious etc. So, I guess what all of this boils down to is this: as long as there is money involved there will be the incentive to fudge the data.This trend in practices that not only pervades the fledgling computing industry but many fields where the data dictates how money will be spent. We all suffer for this in unforseen ways like higher retail prices (afterall much marketing capitol gets wasted), bad science/academia (publish or perish fields being corrupted) and bad products (tires anyone?).
  105. It's not necessary for everyone by barooz · · Score: 1

    In most cases these days, companies place banner ads because they expect placing these ads will increase their sales. When they purchase the ad they have figure in mind that they'd like to hit for increased sales. If that number isn't hit, something is wrong, and the strategy needs to be re-evaluated. If you are placing a banner ad for the purpose of marketing, that is a different story. Marketing efforts are very difficult to gauge for quality because the results are abstact in the short run, and hopefully lead to sales in the long run. In this case, I could see the service being useful.

  106. Nielson will not work, and that is good by twitter · · Score: 1
    There are just too many competing web sites to keep up with! What's the value of a cable access advert? Devide that by the millions of websites cropping up and you get a dot com bust.

    That's AOK by me. Who needs these mega sites with their big stupid banner adds? Let's not help these people figure out how to extend the GE, Westinghouse, ABC, DEFG, GodKnows, RIAA, MPAA, BS media monopolies into the web. If you've got any bright ideas, burn the paper work and go have a beer.

    --

    Friends don't help friends install M$ junk.

  107. Reality and Comparable Statistics by CharmQuark · · Score: 2
    The fundamental thing with statistics, and the reason that most people are so easily confused by them, is that statistics are meaningless without a strict context. Not only that, statistics can be harmful when used for purposes outside of that context. We can use these facts to look at the current example of Web Site hit counts.

    First, suppose I am using a number of web sites to promote my online store, In this case, I may be most interested in the amount of sales each site produces from click through users. For this purpose, I can simply assign a sale to a certain site. For the purposes of this discussion, I will assume that all sales can be assigned to a certain web site. At certain intervals, I can find the percent profit attributable to each site, and create a statistic with the ratio of the % profit from a site to the cost of advertising on that site. This statistic will create a valid comparison between sites.

    Second, suppose I am most interested in branding, as Verizon is of late. In this case, I might want to pay an external agency to monitor the sites on which I advertise. Such an agency would presumable use a consistent and statistically sound method to determine the number of eyes that has seen my brand. I can then set up a statistic with the ratio of # of eyes to the cost of advertising for each site. Again, this will create a valid comparison.

    It is notable that in either case the web logs for particular sites are not clearly useful. Even if the information itself was not suspect, web logs would not be comparable between sites. It would be difficult to set up a useful statistic to compare the value of each site with respect to my product. To put it another way, the web log for a particular site are useful to that site for generating a number of site specific statistics, but few if any of those are going to be of interest to me as a paying advertiser.

  108. it all depends on who does the counting by iritant · · Score: 1

    If you are doing the counting for your own purposes, and you want very accurate statistics, you should consider doing two things:

    1. Use cookies to identify each browser. You want to do this anyway in order to track sessions, and for reliability purposes. Now say what you will about cookies, but their proper use would be a good thing for everyone (yes, and world peace is a great goal, I know). I would solve this by querying the browser and then pleading with the individual to allow cookies through by promising good behavior (and following through on that promise, unlike Amazon). It won't be perfect but it would be closer to enumeration than sampling.

    2. Reconcile that data with unique IP addresses (i.e. if the browser won't allow you to set a cookie, consider counting at least that ip address).

    Again the counting mechanisms won't be perfect, and it would be useful to do this to validate the statistical methods.

  109. Hey a use for CARNIVORE! by 64.28.67.48 · · Score: 2

    The FBI could get into the business of counting hits. I mean, they'd be reading through all the traffic anyway; they might as well do something useful with it...

    -------------

    --

    -------------
    The truth is out th- oh, wait, here it is...
  110. even radio... by krich · · Score: 1

    ... would like real stats. I just saw a video news story on one of the all-news channels about how some shopping centers are installing detectors in their parking lots that can detect which radio station you are listening to. The shops in the mall/shopping center then use that data to determine their media buys.

    All technically-derived stats are subject to manipulation. The answer is not, however, to rely on faulty sampling.

  111. The difference by Traicovn · · Score: 1

    Well, I guess the difference is in 'what are you measuring?'

    Hit's from a single IP (50 people might use a single computer in a library a day) or are you monitoring distinct users. Are you counting only one unique user per month per person, or are you counting me everytime i come to your site....Getting honest statistics on the web is not an easy thing i guess... Nielsen ratings are done by 'test groups' that is not the best way to do the web, because there are so many different kinds of people out there and so many different kinds of sites. Some people would also turn off nielsen-type software. (I remember that the nielsen rating system issued a box that you plugged into your tv, if you forgot to turn the box off it would keep reading that you were watching shows even if you turned off the tv.... therefore not giving honest ratings, or people would forget to turn on the box (say if they were watching something they weren't supposed to??))
    I think that having a third party monitor your websites traffic and then comparing it to your own readings and then doing an average would be the best idea....
    anyway, just my personal ideas....

    --

    [Something witty and intelligent should have appeared here.]
    {Traicovn}