Slashdot Mirror


New Method of Tracking UIP Hits?

smurray writes "iMediaConnection has an interesting article on a new approach to web analysis. The author claims that he is describing 'new, cutting edge methodologies for identifying people, methodologies that -- at this point -- no web analytics product supports.' What's more interesting, the new technology doesn't seem to be privacy intrusive." Many companies seem unhappy with the accepted norms of tracking UIP results. Another approach to solving this problem was also previously covered on Slashdot.

174 comments

  1. Want efficiency? by Anonymous Coward · · Score: 0

    Barcoded humans =)

    1. Re:Want efficiency? by JohnPerkins · · Score: 0, Offtopic

      I had a history teacher back when I was in jr high school. She was barcoded, by the Nazis at a concentration camp.

    2. Re:Want efficiency? by komeedipoeg · · Score: 0, Offtopic

      How was it possible when Barcode was invented in 1948 and WW2 ended in 1945? http://en.wikipedia.org/wiki/Barcode

    3. Re:Want efficiency? by utnow · · Score: 0

      war secrets! lol. it was probably just something very similar in appearance to a barcode that didn't quite qualify as the same thing.

    4. Re:Want efficiency? by komeedipoeg · · Score: 0, Offtopic

      Hmm, how did they read it? Because todays barcode readers have sometimes difficulties reading bracodes from black&white print. Maybe they just wrote numbers not barcodes? Or did germans made it harder for themselves and started to use somekind of complicated coding? And one more question when you where in Jr, maybe it was at the same time they released the Alien movie, where they marked prisoners with barcode and that teacher happened to be a fan of that movie :P sry bad english

    5. Re:Want efficiency? by WilliamSChips · · Score: 0
      bracodes
      Is that a code to get into somebody's bra? In that case, give it to me! :P
      --
      Please, for the good of Humanity, vote Obama.
    6. Re:Want efficiency? by 1u3hr · · Score: 1
      How was it possible when Barcode was invented in 1948

      They tattooed ordinary figures, not barcodes. Photo

    7. Re:Want efficiency? by utnow · · Score: 0

      Ooh that's a really simple code. I know on PC it's just "ghb". Not sure about xbox though...

  2. uhm, what? by Prophetic_Truth · · Score: 3, Funny

    new, cutting edge methodologies for identifying people....the new technology doesn't seem to be privacy intrusive

    The Wookie defense in action!

    --
    time is a perception of a being's consciousness
    time is your 6th sense, the wierd ones are 7+
    1. Re:uhm, what? by zxking · · Score: 2

      ...the new technology doesn't seem to be privacy intrusive...

      Give me a break. How can this be possible when the approach suggests using multiple tests rather than one, ranging from analyzing dated cookies, IP addresses and Flash Shared Objects?

      Their approach seems to be common-sense. I believe most sites worth some salt do not use just one metric. Maybe if someone can get a hold of the research paper and post it, then we can see if their implementation is really revolutionary. Another problem is that the guys, both the authors and the researchers, have not actually tested their methodology on scalable site.

      Premature elation? Let us see the paper and decide.

    2. Re:uhm, what? by mwvdlee · · Score: 4, Insightful

      Since their "cutting edge methodology" is basically all the previous methods botched together, how can it ever be LESS privacy intrusive than the methods it's made up of?

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    3. Re:uhm, what? by Shaper_pmp · · Score: 4, Insightful

      "Their approach seems to be common-sense."

      Their suggestion may be common-sense, but their approach borders on messianic:

      "This article is going to ask you to make a paradigm shift... new, cutting edge methodologies... no web analytics product supports... a journey from first generation web analytics to second."

      Followed by a lengthy paragraph on "paradigm shifts". In fact, the article takes three pages to basically say:

      "In a nut-shell: To determine a web metric we should apply multiple tests, not just count one thing."

      Here's a clue, Brandt Dainow - It's a common-sense way of counting visitors, not a new fucking religion.

      The basic approach is to use a selection of criteria to assess visitor numbers - cookies first, then use different IPs/userAgents with close access-times to differentiate again, etc.

      The good news is there are only three problems with this approach. The bad news is, that makes them effectively useless, or certainly not much more useful than the normal method of user-counting:

      Problem 1
      There is no information returned to a web server that isn't trivially gameable, and absolutely no way to tie any kind of computer access to a particular human:

      "1. If the same cookie is present on multiple visits, it's the same person."

      Non-techie friends are always wanting to buy things from Amazon as a one-off, so I let them use my account. Boom - that's up to twenty people represented by one cookie, right there.

      "2. We next sort our visits by cookie ID and look at the cookie life spans. Different cookies that overlap in time are different users. In other words, one person can't have two cookies at the same time."

      Except that I habitually leave my GMail account (for example) logged in both at work and at home. Many people I know use two or more "personal" computers, and don't bother logging out of their webmail between uses. That's a minimum of two cookies with overlapping timestamps right there, and only one person.

      "3. This leaves us with sets of cookie IDs that could belong to the same person because they occur at different times, so we now look at IP addresses."

      This isn't actually an operative step, or a test of any kind. It's just a numbered paragraph.

      "4. We know some IP addresses cannot be shared by one person. These are the ones that would require a person to move faster than possible. If we have one IP address in New York, then one in Tokyo 60 minutes later, we know it can't be the same person because you can't get from New York to Tokyo in one hour."

      FFS, has this guy ever touched a computer? For someone writing on technology he's pretty fucking out of touch. As an example, what about people who commonly telnet+lynx, VMWare or PCAnywhere, right across the world, hundreds of times in their workday? Sure, maybe most normal users don't (yet), but for some sites (eg, nerd-heavy sites like /.), it's likely enough to start skewing results.

      "5. This leaves us with those IP addresses that can't be eliminated on the basis of geography. We now switch emphasis. Instead of looking for proof of difference, we now look for combinations which indicate it's the same person. These are IP addresses we know to be owned by the same ISP or company."

      Except that one ISP can serve as many as hundreds of thousands of users. And proxy gateways often report one IP for all the users connected to them. For example, NTL reports one "gateway" IP for all the people in my town on cable-modems - that's thousands, minimum. So, we're looking at a potential error magnitude of 100-100,000. That's no better than the existing system for assessing unique visitors.

      "6. We can refine this test by going back over the IP address/Cookie combination. We can look at all the IP addresses that a cookie had. Do we see one of those addresses used on a new cookie? Do both cookies have the same User Agent? If we get the same pool

      --
      Everything in moderation, including moderation itself
    4. Re:uhm, what? by pAnkRat · · Score: 1
      Uhhm nope:

      1:

      Non-techie friends are always wanting to buy things from Amazon as a one-off, so I let them use my account. Boom - that's up to twenty people represented by one cookie, right there.


      Nope,
      They use a different session (is different cookie), while using your account. Unless they physically access your computer/browser they are different users.

      But you are pretty right on the bet on the other points.

      --
      we need an "-1 Plain wrong" moderation option!
    5. Re:uhm, what? by smooth+wombat · · Score: 1
      ranging from analyzing dated cookies, IP addresses and Flash Shared Objects?

      What about those of us who kill our cookies at the end of every session and who don't use Flash? How are they going to find out if it's me or someone else?

      No cookies, no information. To them I'm a unique individual every single time. The only thing they could possibly track down would be information from cookies which already exist on my system from other sites and try to decipher that information.

      --
      We will bankrupt ourselves in the vain search for absolute security. -- Dwight D. Eisenhower
    6. Re:uhm, what? by e4g4 · · Score: 0

      This way Flash can report to the system all the cookies a machine has held

      Yeah, reporting every site that drops a cookie on my machine doesn't intrude on my privacy in the slightest.

      --
      The secret to creativity is knowing how to hide your sources. - Albert Einstein
    7. Re:uhm, what? by Shaper_pmp · · Score: 1

      Uhhm nope.

      They use my machine to do it - I'll let them order one thing on my credit card while I'm watching, but I'm buggered if I'm giving them my login details to order as much stuff as they like <:-)

      --
      Everything in moderation, including moderation itself
    8. Re:uhm, what? by Anonymous Coward · · Score: 0

      I seriously hope you're kidding. Same computer or not, different session cookies or not, it has to count as one person. Or are you trying to say that if I log into my Amazon account from 20 different computers, I count as 20 different people!?!

      That's just ridiculous. There is no reason to assume that many people are using the same account. In fact, just the opposite. Sharing your login is an extremely bad idea. So it should be a safe assumption that the login is used by only one person, even if a different cookie is assigned.

    9. Re:uhm, what? by Anonymous Coward · · Score: 0

      You did'nt get it. This is exactly the reason why relying on one of these method alone is not quite accurate. Like the artcle said 30% on inaccuracy.

      On the other hand, relying on all of these at the same time might get you to something like 0.1-10% inaccurary depending on the target audience of your site.

      It's not a perfect method. It's just better than one of these alone.

    10. Re:uhm, what? by golden123 · · Score: 1

      this is nothing new, we did all they claimed + more 5 years ago, sessionized tracking across 600 websites for a major movie studio.

  3. CPUID by frinkacheese · · Score: 4, Funny


    Sending your PCs unique CPUID along with every HTTP request would be ideal for this. You could also group up websites and use this to track people across websites. It would be great for marketing and for law enforcement.

    Oh, you all disabled your nice Intel CPUID? Why ever would you want to do that?

    1. Re:CPUID by WebCrapper · · Score: 1

      I don't even think there is a MB manufacturer that ships with the CPUID turned on anymore...

    2. Re:CPUID by KillShill · · Score: 3, Interesting

      Treacherous/Insidious Computing to the rescue.

      no need for cpu id's when your entire system and its OS will generate a 128bit id for you. and give them out to "trusted" "partners".

      remote attestation never sounded so good.

      --
      Science : Proprietary , Knowledge : Open Source
    3. Re:CPUID by SolitaryMan · · Score: 1

      Sending your PCs unique CPUID along with every HTTP request would be ideal for this.

      I understand that you are being ironic, but in fact, CPUID won't be a silver bullet either. These researchers are trying to calculate the amount of different persons visiting the site, not the amount of different CPU's.

      --
      May Peace Prevail On Earth
    4. Re:CPUID by frinkacheese · · Score: 1


      Indeed, but generally I would say that 1 person = 1 cpu, apart from shared cpus such as in schools, web cafes and such. But I guess that a combination of IP address and Browser information can pretty much od that already.

      OK, so what is really needed is a RFID implant - take yer CPUID with you, then software can really be licensed to a PERSON rather that a processor. Pay Amazon every time you click(tm) on a link(tm).

    5. Re:CPUID by aussie_a · · Score: 1

      no need for cpu id's when your entire system and its OS will generate a 128bit id for you. and give them out to "trusted" "partners".

      Which Linux distro does this? I'd like to avoid them.

    6. Re:CPUID by aussie_a · · Score: 3, Insightful

      Indeed, but generally I would say that 1 person = 1 cpu

      Not really. I surf the internet at home and at school. I imagine I'm not alone. So I would be registered as two different people.

      Indeed, but generally I would say that 1 person = 1 cpu, apart from shared cpus such as in schools, web cafes and such

      You forgot "pretty much anyone who doesn't alive alone and has a computer with internet access at home." Let's not forget that tiny percentage of people (I know, most slashdotters visit slashdot while avoiding work, but there are people out there who have families that have more then one person using a single computer. It's crazy I know).

    7. Re:CPUID by Ian.Waring · · Score: 1
      Sending your PCs unique CPUID along with every HTTP request would be ideal for this.

      Nah. How long into the future do you reckon you'll have one CPU to one person? Anonymity is a deskside cluster...

      Ian W.

    8. Re:CPUID by cheekyboy · · Score: 1

      You will never know for 100% sure, unless your site DISALLOWS all nonauthenticated user logins, and shows zero content except a signup page.

      So a rough guess of

      UNIQUE USERS = (UNIQUE IPS - REAL UNIQUE LOGINS) /2

      Will be spot on across the average of the whole planet for all 2 billion websites.

      Mmmm gota luv statistics. Averages are your friend.

      --
      Liberty freedom are no1, not dicks in suits.
    9. Re:CPUID by tepples · · Score: 1

      Which Linux distro does hits? I'd like to avoid them.

      What happens when the Linux distros that support custom HTTP happen to be the only Linux distros supported by your ISP's DHCP server? Once the major cable and telephone companies begin to require support for "Trusted" Computing before they'll give you an IP address, will you go back to dial-up to escape Trusted Network Connect?

    10. Re:CPUID by Alsee · · Score: 1

      The Linux kernel has had driver support for Trusted Platform Modules chips since 2.6.12.

      Gentoo appears to be the distro leading the charge.

      -

      --
      - - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
  4. UIP? by XanC · · Score: 4, Funny
    I tried to find out for myself, I really did. I can't figure out if any of these dictionary.com results apply. This is the complete list, and none of them seemed to fit. There's one kind of humorous one...

    International Union of Private Wagons
    Quimper, France - Pluguffan (Airport Code)
    Ultimate Irrigation Potential
    Uncovered Interest Parity
    Undegraded Intake Protein
    United International Pictures
    Universidad Interamericana de Panamá
    Unusual Interstitial Pneumonitis
    Upgrade Improvement Program
    Urinating In Public
    User Interface Program
    USIGS Interoperability Profile
    Usual Interstitial Pneumonia of Liebow
    Utilities Infrastructure Plan

    1. Re:UIP? by key134 · · Score: 1

      Unique IP? I think? Just a guess from context...

    2. Re:UIP? by JohnPerkins · · Score: 1

      International Union of Phlebology
      Paraguayan Industrial Union
      UCAR Intellectual Property
      Unintended Pregnancies
      Union Interparlementaire
      Universal Immunization Program
      University Interaction Program
      Update In Progress
      Urban Indicators Program
      Utility Interface Panel...

      ...and that's enough for now. Bedtime for John.

    3. Re:UIP? by 1u3hr · · Score: 2, Informative
      And strangely enough, this acronym isn't used in TFA at all. In fact, if the submitter did mean "Unique IP" that's not at all what the article is about (after all, that's trivial to record). They're looking for the number of unique individuals, and trying to deduce that from Cookies, IP, and other data.

      Unique Individual? P???

    4. Re:UIP? by Mostly+a+lurker · · Score: 1

      Could be "Unique Individual People" I suppose, but this is a classic example of the rule that all acronyms (other than those in universal use) should be explained on first use.

    5. Re:UIP? by hattig · · Score: 1

      Unique Individual Porn-viewing-habits

      Underwear Ingesting Parrot
      Unified Identity Procedure

      This is a new low for Slashdot. Not only is there an unexplained TLA in the article, no-one can actually work out what it stands for in the context of the story!

    6. Re:UIP? by karmatic · · Score: 1

      User Identification Persistance? Something that allows you to track users (ala a cookie), but is persistant in some way?

    7. Re:UIP? by Anonymous Coward · · Score: 0

      Sorry, but what is a TLA?

    8. Re:UIP? by Barsema · · Score: 2, Funny

      What you meant to say: all TLA not in TFA should be EFU ;-)

    9. Re:UIP? by 1u3hr · · Score: 1

      No one has worked it out yet? I Googled for "UIP hits", only found a few pages in German.

    10. Re:UIP? by Epistax · · Score: 1

      I was learning communications in a class with a professor doing a pretty poor job. He mentioned "CTS" without saying what it meant. I have a lappy in front of me to view the lecture slides so I did a quick search. Finally I knew that when a communication needs to occur, a computer sends out a Cattle Tracking System.

    11. Re:UIP? by needacoolnickname · · Score: 1

      It's not about context. It's about being funny.

    12. Re:UIP? by Sexy+Bern · · Score: 1
      Three-letter abbreviation.

      Laugh, it's funny. Honestly.

  5. Step 4. . . by SpaceAdmiral · · Score: 5, Insightful

    We know some IP addresses cannot be shared by one person. These are the ones that would require a person to move faster than possible. If we have one IP address in New York, then one in Tokyo 60 minutes later, we know it can't be the same person because you can't get from New York to Tokyo in one hour.

    If my company had computers in New York and Tokyo, I could ssh between them in much less than 60 minutes. . .

    1. Re:Step 4. . . by Anonymous Coward · · Score: 0

      I frequently do this from my office machines which are on a different ip block from my home cable modem. It helps when accessing sites like expedia that play games with prices that are offered. TFA's theories are nice, but like the other methodologies listed, they provide probabilities, not certainty. My goal is to be in the unpredictable bunch, if possible. Or did you know I was going to say that?

    2. Re:Step 4. . . by antic · · Score: 1


      What percentage of people do you think do that?

      --
      'Thats they exact same thing a banana wrench monkey.'
    3. Re:Step 4. . . by CoolGopher · · Score: 1
      If my company had computers in New York and Tokyo, I could ssh between them in much less than 60 minutes. . .

      The point is, most people wouldn't do that, and those who do wouldn't be significant to skew the metrics too badly.

      However, having said that, it is quite possible to have a network configured for high availability such as that if you lose your local internet link traffic gets routed via your internal network out another internet link in another office. Frequently this office is in another country...

      Would that be enough to stuff up the metrics? I have no idea, but it'd be worth considering before going all "oooooh" about this "paradigm shift" (was I the only one who missed what it was supposed to be? What's the news here really? Oh, wait, this is Slashdot...)

    4. Re:Step 4. . . by lgftsa · · Score: 1

      A corporate WAN with multiple routes to the internet and load-balanced http proxies would do it, too.

    5. Re:Step 4. . . by jrumney · · Score: 1
      However, having said that, it is quite possible to have a network configured for high availability such as that if you lose your local internet link traffic gets routed via your internal network out another internet link in another office. Frequently this office is in another country...

      Even more frequently your "internal" link to the remote office is via VPN over your local link, so this isn't really an option. Redundant local links are more likely.

    6. Re:Step 4. . . by cheekyboy · · Score: 1

      So if only 1% of users are like you, then we will take all hits from your 'case' and divide em by 100, and add that to our unique users count. There you will be counted, but fairly. Its all about have 2 trees of decision, and counting both totals and using a percentage of each for your final tally. A great accurate result at a low resolution of time, (think audio khz, 1hr = ~.000027 hz) So just as a photo of mars showing the whole thing - vs a photo of a one tiny rock. A wider low res gives us a new 'global picture' that is both accurate and both very very low detail. Call in statistical anti aliasing.

      --
      Liberty freedom are no1, not dicks in suits.
    7. Re:Step 4. . . by jrumney · · Score: 1

      I've encountered a site that played games with its prices before. When trying to book the lowest priced fares - the ones that entice you onto the site, every second mouse click resulted in a 500 error from the server. When trying to book a more expensive fare, everything went smoothly. The only way I could successfully book their low priced fares was to open two browser windows with the same session, and alternate between the steps for booking a high priced fare (aborting at the last minute) and the low priced one. Once I'd successfully booked the low priced fare, I ended up wondering whether I was going to turn up to the airport to discover that that flight did not actually exist.

    8. Re:Step 4. . . by WillyMF1 · · Score: 1
      Once I'd successfully booked the low priced fare, I ended up wondering whether I was going to turn up to the airport to discover that that flight did not actually exist.

      So don't leave us hanging.... Did they book you on the old "kilrathi shuttle", or did you get where you were going?

    9. Re:Step 4. . . by jrumney · · Score: 1

      It was real. I suspect that they raised the prices on that flight a few days later and let people book it.

  6. Field test by enoraM · · Score: 2, Funny

    iMediaConnection starts a huge field test of tracking unique slashdot readers with their cutting edge technologies.

    1. Re:Field test by Anonymous Coward · · Score: 0

      Yes, but what about us ACs? Do we count for nothing?

  7. And uip is ? by core · · Score: 0, Redundant

    Surely not urinating in public, although that would be important to track, too.

    --
    Atlantis, a runaway hit, ball matching game for mac: http://www.funpause.com/

  8. I'm glad it isn't Rocket Science by elronxenu · · Score: 3, Interesting
    He fails to consider the possibility of the same user using different browsers (and hence the same IP address, but different cookies, and a different browser identification string).

    So you can use probabilistic means to identify unique visitors. That's not a paradigm shift, except for those whose paradigms are already very small.

    Somehow I don't think this research is worthy of an NDA.

    1. Re:I'm glad it isn't Rocket Science by Tony.Tang · · Score: 2, Insightful

      Mod this parent up.

      I don't mean to be a poo poo here, but this isn't as huge a deal as the author has made it sound (i.e. it certainly is not a "paradigm shift").

      Instead, what we have here is an evolutionary suggestion in how we can track users more accurately. Kudos.

      As with all solutions in CS, there are problems. As the parent has correctly observed, this doesn't solve the "multiple browsers, same user" problem (which is common -- you probably use a different computer at work than at home). I am not certain, but in fact, it is possible that realistically, this process they use here only solves the "this is the same browser" problem -- many users simply leave their credentials in place (i.e. logged in -- say to /.).

    2. Re:I'm glad it isn't Rocket Science by fsterman · · Score: 1

      While I agree this is hardly a paradigm shit I think the poster is grasping at straws with his/her example. How many people surf between two browsers? I switch browsers when FF can't handle something. I migrate to a news browser every time something compelling comes along. How many people switch browsers in the same month?

      Computers, that might be a larger percentage. But even then more tests could be done. Message boards you distract youself with at work that have a login-system which sets an everlasting cookie with a uniqe ID would be trackable across locations. What percentage of all internet users have more than one computer they browse a variety of sites with?

      --
      Is there anything better than clicking through Microsoft ads on Slashdot?
    3. Re:I'm glad it isn't Rocket Science by byssebu · · Score: 0

      The way a user moves the mouse in the browser together with statistical analasys of the keypresses can serve as an anonymous fingerprint. If all browsers supported that, identifying users would work across many browsers, plattforms and IPs. Ofcourse i don't know if it's sufficiently unique between users...

    4. Re:I'm glad it isn't Rocket Science by Anonymous Coward · · Score: 0

      So you can use probabilistic means to identify unique visitors.

      No, you can't. Person A visits a website, and finds nothing of interest on the home page, so he closes his browser. Person B does the same.

      If they use the same ISP and are located in roughly the same region, then Person B will simply be downloading the resources that make up the front page from his ISP's shared cache.

      What to do? Disable caching for your HTML pages? Great, you've slowed down your whole website for everyone, increased your bandwidth bill substantially, and added load to your server.

      Put an embedded 1x1 image on the page, which is uncachable? Okay, so now your pages can be cached, so the performance won't be as bad, but now you aren't counting anybody who doesn't load images (e.g. Lynx users, smart dialup users, people running proxies that disable 1x1 images, etc).

      Do the same, but with Javascript? Same problem with the images.

      The bottom line is that the only way you can reasonably rely on a visitor showing up in your logs is to disable caching for your HTML, which has serious performance issues. And that doesn't even begin to address the problem of distinguishing between visitors, which is an even harder problem.

    5. Re:I'm glad it isn't Rocket Science by quanticle · · Score: 1

      The problem isn't necessarily one of >1 computer/person, its the problem of >1 person/computer. At best the method described would catch and indentify unique computers. But since not every household has 1 computer/person, this method would fail to catch other people using the same computer.


      For example, if I went to Site Z while browsing, and someone else within my family goes to Site Z within a reasonably short amount of time, how will the site distinguish among us?


      It wouldn't really be a problem for home users with ~4 people, but if you're trying to capture traffic from libraries, schools, or any other place with public terminals, this could represent a serious source of undercount error, especially considering that most of those sites are behind some kind of NAT, and therefore present a single IP to the outside world.

      --
      We all know what to do, but we don't know how to get re-elected once we have done it
    6. Re:I'm glad it isn't Rocket Science by fsterman · · Score: 1

      With the proliferation of multi-user logins, hand-me-down, and that most people in a single house (age wise) will be going to different sites. I imagine that this will continue to decline as an obstacle.

      --
      Is there anything better than clicking through Microsoft ads on Slashdot?
    7. Re:I'm glad it isn't Rocket Science by quanticle · · Score: 1

      True, but at my library all users use a guest account, and the it is routine for another user to sit down when one user gets up. If the second user goes to the same site as the previous user (a distinct possibility with popular sites like CNN), the second user may not be counted as a unique visitor, as his IP address would be the same, there wouldn't be a significant time difference, and the cookies would all be present from the previous session.


      Also, what happens when multiple people go to a site from a location that has NAT? Do they all get counted as one person?

      --
      We all know what to do, but we don't know how to get re-elected once we have done it
  9. Whats the new definiation of privacy these days? by Anonymous Coward · · Score: 4, Insightful
    "This way Flash can report to the system all the cookies a machine has held. In addition to identifying users, you can use this information to understand the cookie behavior of your flash users"

    I'm not sure what the Flash is, but to me, scanning all the cookies your computer has had IS privacy intrusive.

  10. What's new about this? by nitelord · · Score: 1

    What's so new about this? How is this news? Very little substance to the article, plus I've been using IPs, Cookies and Logins to track people for a long time.

  11. Paradigm shift ? by l3v1 · · Score: 2, Insightful

    No single test is perfectly reliable, so we have to apply multiple tests.

    No kidding. This guy probably needs a wake up call.

    We know some IP addresses cannot be shared by one person. These are the ones that would require a person to move faster than possible. If we have one IP address in New York, then one in Tokyo 60 minutes later, we know it can't be the same person because you can't get from New York to Tokyo in one hour.

    Ok, so this is what normally is called a really stupid argumentation. I don't say that it can't be accounted for, but stating such a thing is nothing more than plain stupidity. Has this guy ever heard about that Internet thing ?

    Flash can report to the system all the cookies a machine has held.

    Uhmm, not a great argument to make people use it.

    No one wants to know.

    I don't think they don't want to know. They just don't want to see a sudden drop of ~50% of their user count from a day to the other. And it really doesn't matter if it's the truth or not. A drop is a drop.

    --
    I am putting myself to the fullest possible use, which is all I can think that any conscious entity can ever hope to do.
    1. Re:Paradigm shift ? by hal9000(jr) · · Score: 1
      I don't think they don't want to know. They just don't want to see a sudden drop of ~50% of their user count from a day to the other. And it really doesn't matter if it's the truth or not. A drop is a drop.

      Replace the word "they" with "companies that are deriving revenue from web traffic." This guy makes his money from selling analytics software so that companies can track the success of thier web sites and based on tracking, make modifications, sell advertising, marketing, and so forth.

      Seeing a 30-50% drop in unique visitors would be bad even if it were true. And would you want to be the only site out there competing for advetising dollars that shows a 30-50 less traffic even it if is more honest? Hell no, you wouldn't. You wouldn't be able to convince the media buyers that your right and your competition is wrong. For that matter, do you really want to go your board and explain that shiny new 6 figure website is really only serving 1/2 of your customers you thought you were serving?

  12. Privacy by JohnGrahamCumming · · Score: 2

    What's more interesting, the new technology doesn't seem to be privacy intrusive

    The only mention of the word "privacy" on the linked web page is the term "Privacy Policy" at the bottom of the page.

    John.

  13. UIP = by Anonymous Coward · · Score: 0

    Unique IP

    1. Re:UIP = by ZeroExistenZ · · Score: 1

      Thanks for that,
      I was already wondering where the pictures were for "Uniquely Inserted Probes", as this article seems to be announced as such a big breakthrough.

      --
      I think we can keep recursing like this until someone returns 1
  14. What the fuck is "UIP hit"? by Anonymous Coward · · Score: 0

    It would be great if submitters would add a sentence or two explaining the key acronym(s) in their article instead of assuming that everybody already knows about their pet interest. RTFA to find out? Why should I waste my time reading an article that might not be of any interest to me?

    1. Re:What the fuck is "UIP hit"? by cheekyboy · · Score: 1

      a if they knew what HTML was,

      they could have use

      http : : / /

      but DUH... get a clue. Get a web page for dummies hand guide man.

      --
      Liberty freedom are no1, not dicks in suits.
  15. crap again. by gunix · · Score: 4, Insightful

    From the article:

    " We know some IP addresses cannot be shared by one person. These are the ones that would require a person to move faster than possible. If we have one IP address in New York, then one in Tokyo 60 minutes later, we know it can't be the same person because you can't get from New York to Tokyo in one hour."

    Everheard of ssh and similar tools to make that travel?
    And they put this on slashdot. Ignorance, just pure ignorance...

    --
    Evolution of Language Through The Ages: 6000 BC : ungh, grrf, booga 2000 AD : grep, awk, sed
    1. Re:crap again. by October_30th · · Score: 1

      The majority of people using computers will never use ssh in their lives. It's not the perfect solution, but it's not complete crap either.

      --
      The owls are not what they seem
    2. Re:crap again. by farnz · · Score: 1

      Not SSH, but web caches. If corporate insist that you browse via a cache setup that fails over from the New York link to the Tokyo link whenever the internal network conditions make it worthwhile, you'll merrily surf from all sorts of addresses.

    3. Re:crap again. by Anonymous Coward · · Score: 0

      I don't even understand what he means by this? How will he know whether one dialup IP is currently in use by someone in new york and then by someone in tokyo? How often does that situation even come up? The only thing I can think of is proxy servers that serve people in a large geographic area.. but how will he know where they are? I guess they could read the date with javascript and poke it back to the tracking server.. hmm.

  16. Still doesn't help deleted cookies by mattso · · Score: 5, Insightful

    They make some silly assumptions that I don't think work with users using proxy agents, but in the end it still boils down to the existence of cookies. Which would be ok, if the problem they are trying to solve wasn't that users are deleting and not storing cookies at all. They do mention using Flash to store cookies, which I suspect will have to be the next area users will have to start cleaning up. But just because cookies don't overlap in time and the IP address is the same doesn't mean it's the same person. A bunch of users that use the same browser and share an IP address that always delete their cookies with this system will look like one user. Vastly under counting. Which I don't think web sites are interested in. Vast over counting is profitable. Under counting, not so much.

    In the end there is no way they can even mostly recognize repeat web site visitors if the VISITOR DOESN'T WANT THEM TO.

    The big problem is stated at the top of the article:

    "We need to identify unique users on the web. It's fundamental. We need to know how many people visit, what they read, for how long, how often they return, and at what frequency. These are the 'atoms' of our metrics. Without this knowledge we really can't do much."

    If knowing who unique users are is that important they need to create a reason for the user to correctly identify themselves. Some form of incentive that makes it worth giving up an identification for.

    1. Re:Still doesn't help deleted cookies by fsterman · · Score: 1

      AFAIK (someone correct me as I don't have a test machine right here) these programs don't delete ALL cookies they delete _ad_based_ cookies. So say, /. and Amazon will still have it's cookies while known ad companies cookies will be gone.
      The less effort it takes too make an account/log in will require less incentive. Please go through as few steps as possible, with log-in and account creatons on the same page as a reply box. Having that reply box on the same page is nice too. Go and read up on [http://en.wikipedia.org/wiki/GOMS GOMS].

      --
      Is there anything better than clicking through Microsoft ads on Slashdot?
    2. Re:Still doesn't help deleted cookies by mattso · · Score: 1

      It wasn't clear to me on a first reading but the article is all about cookies being used by ad companies. It doesn't say that, but only these spyware identified cookies are seeing deletion rates as high as they quote. This article is really aimed at helping advertising networks that have been labeled as spyware and are being regularly deleted by the various anti-spyware apps. What they talk about doesn't really apply on a site level, where the deletion of cookies isn't so common. Of course the banner ads can't easily do more reliable user authentication. So they are totally out of luck basically. In which case combining a few different ways to try and count users might make sense. But in the long run if people don't like and don't want their advertising, they are going to lose. It doesn't matter how they count, if enough users block them and their sites they will go out of business.

    3. Re:Still doesn't help deleted cookies by Epistax · · Score: 1

      "We need to identify unique users on the web. It's fundamental. We need to know how many people visit, what they read, for how long, how often they return, and at what frequency. These are the 'atoms' of our metrics. Without this knowledge we really can't do much."

      So... radio doesn't exist?

    4. Re:Still doesn't help deleted cookies by jez9999 · · Score: 1

      I agree, this isn't that much of a 'solution' at all. To me the solution is clear - big websites need to pool money together and lobby the US government to make it *illegal to delete cookies*. When that's enshrined in law, all other countries will have to follow suit, and the problem is solved.

    5. Re:Still doesn't help deleted cookies by bob5972 · · Score: 1

      While all the web advertisers are busy whining that they can't use cookies to do their market and ad research anymore, they're forgetting how spoiled they've been so far in being able to do all of this research for free. Every other advertising industry has to pay people or volunteers and conduct surveys. They call it market "research" for a reason.

  17. Tragically flawed by tangledweb · · Score: 5, Insightful

    The article's "Sky is Falling" tone rests on a single factoid. "30 to 55% of users delete cookies" therefore current analytics products are out by "at least 30 percent, maybe more".

    That is of course complete nonsense. Let's say we accept the author's assertion that different studies have given cookie deletion rates across that range. I can accept that a significant number of users might delete cookies at some point, but what percentage of normal, non-geek, non-tinfoil-hat-wearing users are deleting cookies between page requests to a single site in a single session? If it is 30%, then I will eat my hat.

    Most cookie deletion amoung the general populace will be being done automatically by anti-spyware software and is not done in realtime.

    The author clearly knows that even the most primitive of tools also use other metrics to group page requests into sessions, so even if 30% of users were deleting cookies, it would not result in a 30% inaccuracy.

    Of course "researchers propose more complex heuristic that looks to be slightly more accurate than current pracice" does not make as good a story as "paradigm shift" blah blah "blows out of the water" blah blah "We've been off by at least 30 percent, maybe more." blah blah.

    1. Re:Tragically flawed by Andy_R · · Score: 1

      If it's true that "30 to 55% of users delete cookies" therefore current analytics products are out by "at least 30 percent, maybe more". then all they need to do is add 42.5% to their numbers then they'll be at most 12.5% out. Did I just shift a paradigm?

      --
      A pizza of radius z and thickness a has a volume of pi z z a
    2. Re:Tragically flawed by Stephen · · Score: 2, Insightful
      The article's "Sky is Falling" tone rests on a single factoid. "30 to 55% of users delete cookies" therefore current analytics products are out by "at least 30 percent, maybe more".

      That is of course complete nonsense. [...] I can accept that a significant number of users might delete cookies at some point, but what percentage of [...] users are deleting cookies between page requests to a single site in a single session? If it is 30%, then I will eat my hat.

      The author clearly knows that even the most primitive of tools also use other metrics to group page requests into sessions, so even if 30% of users were deleting cookies, it would not result in a 30% inaccuracy.

      While I don't want to defend the article, you're missing one crucial point here. Grouping requests into sessions with "good enough" accuracy is easy even without cookies. What companies find cookies essential for is to measure latent conversion. For example: which Google ads best convert into sales, even if the sale doesn't happen for several days? For this sort of analysis, cookie deletion is a problem, and becomes a bigger problem the longer there typically is between lead and sale.
      --
      11.00100100001111110110101010001000100001011010001 1000010001101001100010011
    3. Re:Tragically flawed by jdigital · · Score: 1
      1. The Jupiter report stating that 37% of cookies are being deleted has not really been accepted wholly by the web analytics community. See the recent NY Times article that was linked from Slashdot a few weeks back.
      2. The main reason that companies are not willing to try this new paradigm shift UIP technology is that most people in the industry are already doing it.
      3. The paradigm shift is simply using a bundle of already known tricks and throwing them in a big soup. There is nothing amazing here.
      4. The big problem is with 3rd party permenant cookies. There are easier solutions, such as:
        1. Serving all tracking cookies 1st party. (Most good analytics tools alow you to host the 1x1 tracking GIF's to do this)
        2. Using 1st party session cookies + user agent + IP (or a subset of). This is fairly accurate
      5. I have ditched using unique visitor metrics as they are terribly database intensive and really are not important for true measures of ROI. We track large media buys and have no reports based on UV's as they don't lend themselves to any 'actionable' optimization or other techniques for improving our business.
      6. On the most part user behaviour is variable & only a small subset of analysis is significant for analyzing user behaviour over periods longer than 20 days. I have seen no academic research stating that session cookie + IP + uagent is invalid over such timeframes.
      --
      :wq ~ ~ ~ ~ ~
    4. Re:Tragically flawed by DeusExLibris · · Score: 1

      Would you like some salt to help that hat go down?

      The Jupiter report referenced here shows that 39% of users (yes, ALL users, not just tin-foil hat wearing users) delete their cookies once per month.

      Note that stating a deletion rate without a time period is useless (for the mathematically challenged it is like reporting speed in miles or km rather than mph or kph).

      Of course, the online advertising industry reacted much as you did to this report.

      Atlas did their own report and found - wait for it - exactly the same thing.

      While you are correct that much of the cookie deletion occurs periodically rather than by being blocked, this is largely irrelevant since most publishers want to know monthly visitor figures. This is what the advertising industry expects and it is frequently the metric used to generate a lifetime value per visitor metric which is key to the business' revenue forecasting.

    5. Re:Tragically flawed by jdigital · · Score: 1
      Most conversion processes occur in under 14 days. While 3rd party permenant cookies are an easy way to track this, there are methods that are as reliable over this time frame.

      For conversions taking longer than 7 days, you are generally looking at products with high 'consideration' (to use marketing speak), such as expensive consumer products or travel. These people do have problems when relying on cookies. Not to downplay their pain, but they hardly make up the majority (in number, not dollars) of online advertising.

      Now to contradict myself... :)

      Travel is a very large segment of online advertising spend. But it is so large that the 60% of people who keep cookies (worst case scenario) provide oodles of valuable business information for determining future media budgets.

      What I am trying to say is that user tracking is generally only a concern for small to mid sites that don't have the volume to infer from large samples that do keep cookies. These smaller sites, in general sell low consideration products - thus don't really have to worry about long lead time purchases.

      In effect, the problem is not as large as people in the press say it is.

      A far larger problem (from my recent conversations at AdTech and the web analytics conference) is that 90% of people are pretty clueless in how to determine what the appropriate ROI metric is for their web site. This worries me far more than cookie issues.

      --
      :wq ~ ~ ~ ~ ~
    6. Re:Tragically flawed by jez9999 · · Score: 1

      Who needs to determine that? Just shove Google Adsense on the site, and they determine it for you!

    7. Re:Tragically flawed by Anonymous Coward · · Score: 0

      Firefox will automatically delet cookies after each session. I always setup Firefox to do that and recommend everyone do it. Most people I deal with have set their Firefox browsers up to delete cookies after each session. Maybe we are not "30%", but we are a start.

  18. Um, nope. Can't happen. by DroopyStonx · · Score: 4, Insightful

    I develop web analytic software for a living.

    There's only so much you can do to track users.

    IP address, user agent, some javascript stuff for cookieless tracking.. the only real "unique" identifiers for any one visitor. It stops there.

    Of course, using exploits in flash doesn't count, but supposedly this new method is "not intrusive."

    I call BS because it simply can't happen.

    If a user doesn't wanna be tracked, they won't be tracked. This story is just press, free advertisement, and hype for this particular company.

    --
    We have secretly replaced these Slashdot mods' sense of humor with a rusty nail. Let's see if they notice!!
    1. Re:Um, nope. Can't happen. by rhizome · · Score: 4, Funny

      If a user doesn't wanna be tracked, they won't be tracked. This story is just press, free advertisement, and hype for this particular company.

      Whoa, whoa...let's not fly off the handle here! We don't know that they didn't pay anything.

      --
      When I was a kid, we only had one Darth.
  19. yeah well ... by dancallaghan · · Score: 0, Offtopic

    Analyse this, bitch!

    *slashdots his server*

    1. Re:yeah well ... by dancallaghan · · Score: 1

      Crap, I knew I'd forget something ...

  20. Sounds like voodoo to me... by Black+Art · · Score: 1

    Why do I have this feeling like this "cutting edge technology" involves the entrails of an animal and some form of divination?

    --
    "Trademarks are the heraldry of the new feudalism."
  21. 77340 Upsidedown St by Anonymous Coward · · Score: 0

    I wonder what they will think when they start getting impossible bit patterns, like 7734 and 6027734 and 5773857734?

    I wonder if they'll notice?

    Hexadecimal would probably put the joke way past them.

    1. Re:77340 Upsidedown St by KillShill · · Score: 1

      that wouldn't be possible because on your crippled system you wouldn't have access to the data or network stream.

      even if you managed to block the transfer, you just wouldn't be able to use that particular resource. and so the race begins to see if we meet our dystopian future or avert it... for a while until the "lobbyists" strike back.

      --
      Science : Proprietary , Knowledge : Open Source
  22. Assumption is valid for 95% cases. by mveloso · · Score: 1

    After some thought, I'd probably agree that step 4 is valid for the vast majority of web users.

    The only way this might break is if a large number of people are sitting behind a proxy/cache. But if it is the case they have fallbacks.

    1. Re:Assumption is valid for 95% cases. by hal9000(jr) · · Score: 1

      The only way this might break is if a large number of people are sitting behind a proxy/cache. Three letters--AOL.

    2. Re:Assumption is valid for 95% cases. by CmdrGravy · · Score: 1

      You mean like almost everyone surfing from their work place.

  23. God damn arms race by Anonymous Coward · · Score: 0

    The problem with cookie deletion is not that it happens, but that we've been relying on a single method for identifying people.

    I'm so happy that we have other ways of tracking people. I mean, whenever I clear out my cookies, I'm thinking to myself, "But now how will the Man track my online activities?" Now I can clear out cookies and once again feel safe with the knowledge that somewhere, somebody knows everything I do online.

  24. Re:Whats the new definiation of privacy these days by Krach42 · · Score: 1

    Not to mention a security flaw.

    When you visit my site, you agree to download and run a Flash/ActiveX control that downloads all your cookies to slashdot.org, and then sends them to me, so that I can now present false credentials to slashdot.org to make it think that I have auto-login privledges.

    Awesome design flaw there, but I highly doubt anyone is THAT stupid to put THAT big of a security flaw into a system.

    --

    I am unamerican, and proud of it!
  25. Re:Whats the new definiation of privacy these days by Finitepoint · · Score: 1

    "I'm not sure what the Flash is"

    In this case I think the "Flash" being referred to is Macromedia's Flash plugin. He's not very clear though is he?

    --
    AM
  26. Paradigm shift ?!? by rduke15 · · Score: 5, Insightful

    When I read "paradigm shift" in the very first paragraph, my bullshit sensor sound such a loud alarm that it's hard to continue reading...

    1. Re:Paradigm shift ?!? by fbg111 · · Score: 2, Informative

      And the fact that he actually felt the need to explain what a "paradigm shift" is to his audience - undoubtedly consisting of cynical techies - as if we'd never been (over)exposed to the concept before, quadrupled the BS meter. Honestly, was he born yesterday?

      Oblig Dr. Evil Quote: [about his new "laser"] You see, I've turned the moon into what I like to call a... "Death Star".

      --
      Flying is easy, just throw yourself at the ground and miss. -Douglas Adams
    2. Re:Paradigm shift ?!? by Anonymous Coward · · Score: 0
      Yah, I remember back in the day when I had to shift paradigms without a clutch, up hill both ways, unarmed with only two tin cans connected with string.

      Seriously though, there are two things that turn me away:

      1. When someone mentions a paradigm shift

      2. A Moore's Law graph

      I guess there are three things:

      3. People who talk in "quotes", raising their hands into little quote symbols.

  27. Tropical Storm Jose TROLL by Anonymous Coward · · Score: 0

    WTNT31 KNHC 230536
    TCPAT1
    BULLETIN
    TROPICAL STORM JOSE INTERMEDIATE ADVISORY NUMBER 4A
    NWS TPC/NATIONAL HURRICANE CENTER MIAMI FL
    1 AM CDT TUE AUG 23 2005 ...CENTER OF JOSE MAKES LANDFALL ON THE COAST OF MEXICO...WAS
    GETTING BETTER ORGANIZED AT LANDFALL...

    A TROPICAL STORM WARNING REMAINS IN EFFECT FOR THE GULF COAST OF
    MEXICO FROM VERACRUZ NORTHWARD TO CABO ROJO. THIS WARNING WILL
    LIKELY BE DISCONTINUED LATER TODAY.

    FOR STORM INFORMATION SPECIFIC TO YOUR AREA...INCLUDING POSSIBLE
    INLAND WATCHES AND WARNINGS...PLEASE MONITOR PRODUCTS ISSUED
    BY YOUR LOCAL WEATHER OFFICE.

    DATA FROM THE MEXICAN RADAR AT ALVARADO INDICATE THAT THE CENTER OF
    JOSE HAS MADE LANDFALL ON THE EASTERN COAST OF MEXICO.

    AT 1 AM CDT...0600Z...THE CENTER OF TROPICAL STORM JOSE WAS LOCATED
    NEAR LATITUDE 19.8 NORTH... LONGITUDE 96.8 WEST OR ABOUT 60
    MILES... 95 KM... NORTHWEST OF VERACRUZ MEXICO AND ABOUT 90 MILES...
    145 KM...SOUTH-SOUTHEAST OF TUXPAN MEXICO.

    JOSE IS MOVING TOWARD THE WEST NEAR 9 MPH... 14 KM/HR... AND THIS
    GENERAL MOTION IS EXPECTED TO CONTINUE DURING THE NEXT 24 HOURS.
    ON THIS TRACK... THE CENTER OF JOSE SHOULD MOVE FARTHER INLAND INTO
    THE MOUNTAINS OF EASTERN MEXICO TODAY.

    MAXIMUM SUSTAINED WINDS ARE NEAR 50 MPH... 85 KM/HR...WITH HIGHER
    GUSTS. JOSE SHOULD WEAKEN AS THE CENTER MOVES FARTHER INLAND. THE
    ALVARADO RADAR INDICATED THAT JOSE WAS BECOMING BETTER ORGANIZED IN
    THE LAST FEW HOURS BEFORE LANDFALL...AND THE MAXIMUM SUSTAINED
    WINDS AT LANDFALL MAY HAVE BEEN HIGHER THAN 50 MPH.

    TROPICAL STORM FORCE WINDS EXTEND OUTWARD UP TO 45 MILES... 75 KM
    FROM THE CENTER.

    ESTIMATED MINIMUM CENTRAL PRESSURE IS 1001 MB...29.56 INCHES.

    RAINFALL ACCUMULATIONS OF 3 TO 5 INCHES...WITH ISOLATED HIGHER
    AMOUNTS OF UP TO 10 INCHES OVER THE HIGHER ELEVATIONS...CAN BE
    EXPECTED IN ASSOCIATION WITH JOSE. THESE RAINS COULD CAUSE
    LIFE-THREATENING FLASH FLOODS AND MUD SLIDES.

    REPEATING THE 1 AM CDT POSITION...19.8 N... 96.8 W. MOVEMENT
    TOWARD...WEST NEAR 9 MPH. MAXIMUM SUSTAINED
    WINDS... 50 MPH. MINIMUM CENTRAL PRESSURE...1001 MB.

    THE NEXT ADVISORY WILL BE ISSUED BY THE NATIONAL
    HURRICANE CENTER AT 4 AM CDT.

    FORECASTER BEVEN

  28. You forgot JACU... by ArsenneLupin · · Score: 1

    SCNR...

  29. We've been using improoved method for years... by maedls.at · · Score: 1


    and are already tired explaining customers, why the unique visitors differ from ther built-in log-file analysis.

    See CheckEffect for details.

  30. UIP by Anonymous Coward · · Score: 0

    I don't WANT to be tracked you insensitive cloud!

    1. Re:UIP by WilliamSChips · · Score: 0
      insensitive cloud
      Don't think I've ever seen these before...
      --
      Please, for the good of Humanity, vote Obama.
  31. Some additional steps. by Saggi · · Score: 2, Interesting

    The article uses a lot of time to establish that this is a paradigm shift, when it's actually not. I do believe their idea is good, but basically it's just applying a lot of "possible" user identifiers and merge them together to form a unified result.

    Some of the identifiers they haven't used are linkage on the site. If one page links to another, it might be the same user, if the pages are called in sequence.

    On top of links "time" might be applied. Some links are expected to be clicked fast, others after some reading on the page.

    Some may argue that linkage is what you want to determine in the following analysis, and can't therefore be used to determine the use in advance, but this is not true. The determination of the user uniqueness looks to see if its possible for the user to get from one page to an other, while the analysis want to determine if they did it.

    --
    -:) Oh no - not again.
    www.rednebula.com
  32. Haven't we learned anything from the Simpsons? by halcyon1234 · · Score: 1
    Excuse me, but "proactive" and "paradigm"? Aren't these just buzzwords that dumb people use to sound important?

    I mean, seriously folks-- there is a reason why these things are mocked.

  33. More than just cookies by wranlon · · Score: 3, Informative

    ROI is mentioned, along with the 'atoms' of their metrics: page hit count, popular URL count, URL dwell time, and returning visitors. When these metrics are used to produce reports, how valuable are these reports in ascertaining how ROI is affected by said metrics? For example, getting a neat funnel report of the path people take through a site and where the traffic drops off offers insight into popular paths and locations where people bail out, but apart from listening for errors, there is no further insight into why a person bailed.

    What seems to be missing is gathering insightful information into what transpires while someone is on a particular page. I'd like to know the general trends in behavior, not just the server requests. I've found it more useful to be able to see the interactions with the content than reporting where people enter, traverse, and exit a site.

    1. Re:More than just cookies by aggles · · Score: 1

      Knowing why someone bailed can be difficult, but just knowing where they bailed can be quite useful. By looking at those points, testing a few theories by changing things, then looking at the results - you can frequently improve conversion quite a bit. Surveys are also useful - even if you only get a few takers. If you want to see what folks are actually doing - look at Tealeaf or Business Signatures. What really amazes me about this thread is how many /.ers don't have a clue about what cookies do. You only get tracked across the domains that are owned by the same organization. People from LLBean or Google for that matter, don't see what you are doing on Slashdot. From what I've seen, most users of Web analytics tools roll up statistics, and may add segmentation, but rarely get down to the personal level. If they do - they are risking their reputation and better make sure their customers are OK with it. Those that delete their cookies are just making it harder for the web content developers to understand what is not optimal about their site.

  34. Re:Whats the new definiation of privacy these days by Anonymous Coward · · Score: 0

    Macromedia Flash has a local shared object (LSO), which is similar to a cookie, but less known.
    I presume the proposed tactic is to set a cookie with an id and add the same id to a LSO. That lets you see what happens to your cookies over time, as long as the LSO doesn't get deleted.
    Since there is no LSO management tool by default, LSO's have better lifetimes than cookies.
    There is a firefox extension howerer, that lets you view and delete those LSO's.
    I expect this functionality will eventualy become more widespread, giving LSO's the same kind of reliability as cookies.

  35. Cutting edge? ha! by ZeroExistenZ · · Score: 2, Insightful

    "If the same cookie is present on multiple visits, it's the same person. We next sort our visits by cookie ID"

    Only after that they seem to continue the analys ("We know some IP addresses cannot be shared by one person. These are the ones that would require a person to move faster than possible", etc)

    Thus turning off or regulary removing cookies will render their bleeding cutting edge technology useless? And how are cookies a 'breakthrought'?. Their only alternative to this seems to be;
    You can also throw Flash Shared Objects (FSO) into the mix. FSOs can't replace cookies, but if someone does support FSO you can use FSOs to record cookie IDs.

    I don't know what the fuzz is about

    This is just basic logic, which any decent programmer should be able to come up with, even the M$ certified ones.

    --
    I think we can keep recursing like this until someone returns 1
  36. The Meat of the Article by RAMMS+EIN · · Score: 3, Informative

    For those who can't be bothered to read through all the buzzwords, here's the actual method used:

    Each of these steps is applied in order:

          1. If the same cookie is present on multiple visits, its the same person.

          2. We next sort our visits by cookie ID and look at the cookie life spans. Different cookies that overlap in time are different users. In other words, one person cant have two cookies at the same time.

          3. This leaves us with sets of cookie IDs that could belong to the same person because they occur at different times, so we now look at IP addresses.

          4. We know some IP addresses cannot be shared by one person. These are the ones that would require a person to move faster than possible. If we have one IP address in New York, then one in Tokyo 60 minutes later, we know it cant be the same person because you cant get from New York to Tokyo in one hour.

          5. This leaves us with those IP addresses that cant be eliminated on the basis of geography. We now switch emphasis. Instead of looking for proof of difference, we now look for combinations which indicate its the same person. These are IP addresses we know to be owned by the same ISP or company.

          6. We can refine this test by going back over the IP address/Cookie combination. We can look at all the IP addresses that a cookie had. Do we see one of those addresses used on a new cookie? Do both cookies have the same User Agent? If we get the same pool of IP addresses showing up on multiple cookies over time, with the same User Agent, this probably indicates the same person.

          7. You can also throw Flash Shared Objects (FSO) into the mix. FSOs cant replace cookies, but if someone does support FSO you can use FSOs to record cookie IDs. This way Flash can report to the system all the cookies a machine has held. In addition to identifying users, you can use this information to understand the cookie behavior of your flash users and extrapolate to the rest of your visitor population.

    --
    Please correct me if I got my facts wrong.
    1. Re:The Meat of the Article by aCC · · Score: 1

      4. We know some IP addresses cannot be shared by one person. These are the ones that would require a person to move faster than possible. If we have one IP address in New York, then one in Tokyo 60 minutes later, we know it cant be the same person because you cant get from New York to Tokyo in one hour.

      I don't know. Switching proxies from one in New York to one in Tokyo takes me certainly less than 60 minutes...

      But I guess no method can be perfect.

  37. Typical web analysis junk by Sinner · · Score: 5, Insightful

    About 20% of my time on my last job was spent doing web analysis. It drove me insane.

    The problem is with the word "accurate". To management, "accurate statistics" means knowing exactly how many conscious human beings looked at the site during a given period. However, the computer cannot measure this. What it can measure, accurately, is the number of HTML requests during a given period.

    You can use the latter number to estimate the former number. But because this estimate is effected by a multitude of factors like spiders, proxies, bugs, etc., management will say "these stats are clearly not accurate!". You can try to filter out the various "undesirable" requests, but the results you'll get will vary chaotically with the filters you use. The closer you get to "accurate" stats from the point of view of management, the further you'll be from "accurate" stats from a technical point of view.

    Makers of web analysis software and services address these problems by the simple of technique of "lying". In fact, a whole industry has built up based on the shared delusion that we can accurately measure distinct users.

    Which is where this article comes in. The author has discovered the shocking, shocking fact that the standard means of measuring distinct users are total bollocks. He's discovered that another technique produces dramatically different results. He's shocked, shocked, appalled in fact, that the makers of web analysis software are not interested in this new, highly computationally-intensive technique that spits out lower numbers.

    My advice? Instead of doing costly probability analysis on your log files, just multiple your existing user counts by 0.7. The results will be just as meaningful and you can go home earlier.

    --
    fish and pipes
    1. Re:Typical web analysis junk by SkjeggApe · · Score: 1
      I hear ya! I almost got fired for insisting on using the word "visits" as opposed to "users" in all my metrics. Basically, my boss wanted to show the marketing people pretty graphs showing that "This marketing campaign webpage has been seen by x amount of people", which was total bullsh*t, especially since she REFUSED to discard hits from within the company (which I did tally a few times, showing roughly %10-%20) . She also came up with very impressive "cost savings" numbers based on the following formula:


      "cost to receive a reqest by phone for a catalog/manual/brochure, print it, stuff it it in an envelop and mail it to a customer" * number of raw hits to PDFs = "amount of money I've saved the company".

      Utterly insane, but she did save the company "millions".

  38. Bull by RAMMS+EIN · · Score: 0

    ``The author claims that he is describing 'new, cutting edge methodologies for identifying people, methodologies that -- at this point -- no web analytics product supports.''

    And when you read down to how these "new, cutting edge methodologies" actually work, it comes down to: plant cookies, if that doesn't tell you what you need to know, look at the IP address. Then take into account that different cookies and different IP addresses can still be the same user, if they occur at different times.

    It's clever, but it didn't take a genious to think that up. Why is nobody doing it? Well, because it's too much work and still doesn't give you any guarantees that your conclusions are correct.

    It's nice how TFA is presenting this as the best thing after sliced bread and Longhorn, though.

    --
    Please correct me if I got my facts wrong.
  39. We need to know how many people visit, what they r by rednuhter · · Score: 1

    "We need to know how many people visit, what they read, for how long,"

    Even before tabbed browsing I would often have well over 10 browser windows open, if they measure what I have open for the time the window is open then they will get VERY scewed results.

    Also I have two monitors so windows in the foreground do not always translate to windows I am focusing on. (also high resolution monitors could produce the same effect)

    I wish them luck becuase they need it.

    --
    ERR 411[Max number of witty sigs reached]
  40. Why? by RAMMS+EIN · · Score: 2, Insightful

    Somebody please explain to me: why would you go to all this trouble to get a close estimate of how many unique visitors your site draws?

    I'm personally always more interested in how many pages get requested, and which ones. The first gives me an impression of how popular the site is*, the second tells me which pages people particularly like, so I can add more like that.

    The only reason I see for really wanting to track people is if your site is actually an app that has state. In those cases, you have to use a more bullet-proof system than the one presented in TFA.

    * Some people object that it counts many people who visit once, then never again; but I consider it a success that they got there in the first place - they were probably referred by someone, followed a link that someone made, or the page ranks high in a search engine.

    --
    Please correct me if I got my facts wrong.
    1. Re:Why? by hal9000(jr) · · Score: 1
      Somebody please explain to me: why would you go to all this trouble to get a close estimate of how many unique visitors your site draws?

      Tracking the success of an advertising campaign. Ad-buyers want to get thier message out to eyeballs and they want to message to as many eyeballs as often as possible for their target demographic. Picking sites in their demographic is easy and they know how to do that. Picking a site that drives enough unique traffic is much more difficult.

      I would expect, though I don't know, that large media sites have thier results or thier analytic methods audited similar to a financial audit. So if you can prove, using accepted practices, that your site is moving some number of unique visitors, and that number is more than a similar site, you will proably win hte business all other factos being equal.

    2. Re:Why? by Anonymous Coward · · Score: 0

      If you think about it that's pretty funny. The fundamental idea behind advertising is that you increase your profit if you do it. All this 'tracking eyeballs' bullshit is basically them admitting that any actual increase in sales will be unmeasurably small...

    3. Re:Why? by reidspice · · Score: 1

      this is primarily for retailers who want to track how people are getting to their site and how they are converting to purchasers (or arriving and not purchasing, as the case may be). the more accurate this data, the better they are able to allocate your budget to increase sales. did the user come from an affiliate program? from an email (not all email is unsolicited spam, remember)? from a paid search ad on google? if it was a paid google placement, what keyword drove the click and sale? what keywords are costing lots of money and not driving any sales?

      as the article points out, the current method of tracking (cookies) is flawed. i agree with the previous posters that the solution presented is not a paradigm shift but it does seem to be a much better way to track visitors: you just use a more complex set of filters and analysis. and really, even though the article doesn't say anything about privacy, i don't think most people are going to be too bothered with the privacy ramifications of doing some (inherently inaccurate) IP analysis about where they think you live and what ISP you're using.

      so, while i don't think this is a huge shift in how people can/are tracked on the internet, it's a fairly intelligent improvement on "cookies only" and it really does very little to invade your privacy.

      finally, to all of the people who have made arguments about "yeah but what if i SSH from NY to Tokyo smart guy?" wake up: you are .000001% of the internet users out there. they're looking to make general improvements in tracking users, not get to 100% accuracy by keeping track of the 14 people browsing & buying widgets over cross-continental tunnels.. eesh.

  41. Proof by Tune · · Score: 1

    The article points to Magdalena Urbanska and Thomas Urbanski's original research paper which "reveal" its valididy through a "mathematical proof". (8 pages of formulas, so it must be true) Of course, anyone with a post 1990's knowledge of the thing called "internet" would know that mentioning the existence of a research paper has been replaced by the thing called a "hyper link".

    Googling, I found little more than this link to ARF, an unknown organization boldly calling itself "The Research Authority" (with capitals, mind you).

    No paper (at least not online) and no references to the institute or organization that Magdalena Urbanska and Thomas Urbanski might be affiliated with. Their daily lifes seem to be spent as "researchers" - whatever that may mean.

    It looks like a hoax. A sad hoax. Because why would anyone want to hoax a story this sad?

    1. Re:Proof by cortana · · Score: 1

      Making idiot PHBs of marketing companies burn their money investigating this new technology seems like a good idea to me.

    2. Re:Proof by ThinkMetrics · · Score: 1

      Here's the URL of the organisation which hosted the conference. http://www.esomar.org/ Here's a news article about the conference: http://www.createmagazine.com/news.cfm?IssueName=n ational&NewsID=2080 The research paper is not available on the internet because it is under restricted copyright. Not everything of note is available for free on the internet. I would suggest you bother to take a little more time to look into things before you accuse people of hoaxing. Don't trust Google to provide every link to everything on the internet, they don't visit every site every day. The Author

    3. Re:Proof by Koos+Baster · · Score: 1

      That still doesn't provide much info. Of course it's under copyright, anything written (that isn't infringing itself) is under copyright. That doesn't imply being secretive about it.

      I'm tempted to agree with the grand-parent poster. Although it may not be a hoax, it's surely *sad*. This proves that although sources that are untraceble by google may not be a hoax; they sure as hell have little value.

      I mean: Come on, "The Author"-guy, isn't openness sort of prerequisite to scientific innovation?

    4. Re:Proof by ThinkMetrics · · Score: 1

      The terms of copyright are set down by ESOMAR as a condition of submitting research papers at their conferences. It is their condition that any papers delivered at their conferences become their copyright and they control distribution. If you have a problem with that I suggest you take on the increasing commercialization of academic research caused by the reduction in government funding. It ain't our fault.

    5. Re:Proof by Koos+Baster · · Score: 1

      Hmm. OK. I feel sorry for you. I understand that it may not be as easy for you as it is for others to make a stand against those kind of "intelectual property" policies.

      Personally, I'm glad that copyrights of my scientific work have either remained with me exclusively or have allowed me to disclose full pdfs. (I'm convinced the old-fashioned scientific publishing business model is dying anyway. Except for brand recognition, what do institutions, conferences, publishers, journals add to the distribution of knowledge? How many scientists make a living from their publications?)

      IMHO, you should at least be able to publish an abstract and a list of cited references.

    6. Re:Proof by ThinkMetrics · · Score: 1

      It is obvious to me that you have not read the article this discussion forum is about. If you had you would have seen that I clearly indicated that I was not the author of the research paper, but merely a journalist reporting the research. Under such circumstances your statements about me making a stand against ESOMAR's policies are simply meaningless. It is poor scholarship to make public comments about articles you have not read and you are merely showing your lack of intellectual rigour to the world. Please reserve your pity for yourself - you need it more than I.

    7. Re:Proof by Koos+Baster · · Score: 1

      It is obvious to me that you have not read the article this discussion forum is about. If you had you would have seen that I clearly indicated that I was not the author of the research paper, but merely a journalist reporting the research.

      Now we're getting somewhere ;-)
      Forgive me for finding the word "author" confusing in this context. The story was posted on /. by ScuttleMonkey, the story was submited by a guy named Serge Murray and the linked article mentions a writer called Brandt Dainow. Thats 3 middle men. Then there are the two researchers that broke the news are Magdalena Urbanska and Thomas Urbanski, who could easily qualify as "the author".

      As my focus was on the article's contents, I did not notice your username coincides with that of the company mentioned. iMedia didn't ring a bell, and neither did Think Metric or insite. Forgive me.

      Now on the subject, please explain again why the article mentions that the paper is available, but not how it can be obtained? On the one hand, you suggest that this is important information the world should know about, or at least, may find interesting. On the other hand, it appears that the paper cannot be obtained easily because you decided to refer to a publisher that sits on its copyright.

      To encourage people to spread the news about this revelation in online marketing techniques, you could have:
      - linked to esomar
      - linked to arf
      - published an email/snail mail/phone address
      That would have enabled me to query, order, purchase, whatever the paper. I'm not saying I would have bothered, but at least it would have been inviting. And since I suspect you *HAVE* read the paper, you already know where to obtain it, so what's the trouble?

    8. Re:Proof by ThinkMetrics · · Score: 1

      The final paragraph explains in detail why the document is not on the web, and provides instructions on how to get it. Plenty of people have found this, we get one request for it every 20-30 min. I quote from the article:
      "NOTE: Research Paper Available: Magdalena Urbanska and Thomas Urbanski's original research paper, complete with eight pages of mathematical proof, is available. This paper was delivered at the ARF/Esomar Conference on Research into Worldwide Audience Measurements, June 2005. Because of patent and copyright restrictions we cannot make it freely available for download over the web -- we need to identify the individuals who get it. However, if you would like a copy, please contact me and I'll arrange to have one sent to you."
      - my email address is included via links twice in the article, together with a link to my site.

    9. Re:Proof by Koos+Baster · · Score: 1

      Ah finally we've narrowed it down to one and the same paragraph. Where you an I disagree is on whether this is an inviting and open way to present scientific results. I believe it is not, while you think it is.

      Judging from the 2/3 email you get, you are probably right and I'm wrong.

      Please accept my appologies, I wish you all the best, you're doing a great job. For what it's worth, I hereby grant access to my contributions to this discussion if ever you feel the need to pursuade your sources to make their results more accessible in order to leverage their credibility in the scientific world.

      (I was almost tempted to ask for a copy of the 8 pages of patented mathematical proof on marketing vs. user behaviour. Whoa!. But I guess I'd rather remain clueless :-)

      Sorry for being so childish, I'll stop

  42. Paradigm shift? by Anonymous Coward · · Score: 0

    Wow! THAT's a paradigm shift if I've ever seen one!

  43. Adjusting Macromedia Flash Settings by buro9 · · Score: 4, Informative

    Macromedia have a page that allows you to modify what sites can do on your computer in regards to Flash:
    http://www.macromedia.com/support/documentation/en /flashplayer/help/settings_manager02.html#118539

    1. Re:Adjusting Macromedia Flash Settings by Anonymous Coward · · Score: 0

      Even better, there is a one-liner command to completely manage Flash privacy settings under a Cygwin bash shell...

      $ dd if=/dev/urandom bs=1024 count=808 \
      > of=/cygdrive/c/Program$'\x20'Files/mozilla.org/Moz illa/Plugins/NPSWF32.dll


      Just remember to adjust the path and filename for your installation version and O/S.

  44. One method to reliably track users... by Vo0k · · Score: 1

    One single method that would reliably allow a site to track its users would be that each user needs to log in, and then needs the "session cookie" on each page they visit, and if they delete it, hard luck, log in again. This method is just a step away from another one: Make the pages password-protected and give the password to nobody. Users tracked: 0. Pages visit: 0. Tracking reliablity: 100%.

    --
    Anagram("United States of America") == "Dine out, taste a Mac, fries"
    1. Re:One method to reliably track users... by DeusExLibris · · Score: 1

      Actually, while this is generally more accurate than other methods, it is far from accurate. I did an analysis for one of my customers and found that a single user/password was used on over 1200 computers during a single month.

      This can happen as people share usernames (whose NYTimes login do YOU use?), frequent services like BugMeNot and people fail to logout when using a shared computer.

  45. Too much faith in humanity? by Moraelin · · Score: 4, Informative

    "I highly doubt anyone is THAT stupid to put THAT big of a security flaw into a system."

    Read the article, and the guy is proposing to build exactly that kind of a security flaw into the system.

    Flash can use, basically, some local shared storage on your hard drive. This isn't really designed as cookie storage, and doesn't have even the meager safeguards that cookies have. (E.g., being tied only to a domain.) It's really a space that _any_ flash applet can read and write, and currently noone (with half a clue) puts any important data there.

    This guy's idea? Basically, "I know, let's store cookies there, precisely _because_ any other flash applet, e.g., our own again from a different page, can read that back again."

    Caveat: so can everyone else. I could make a simple flash game that grabs everything stored there, just as you described, and sends it back to me. Including, yes, your session id (so, yes, I can take over your session in any site you were logged in, including any e-commerce sites or your bank) and anything else they stored there.

    Since it's used to track your movements through sites, depending how clueless that's programmed, I may (or may not) also be able gather all sorts of other information about you.

    So in a nutshell his miracle solution is to build _exactly_ that kind of a vulnerability (not to mention privacy leak) into the system.

    So, well, that's the problem with assuming that "noone could be THAT stupid". Invariably when I say that, someone kindly offers himself as living proof that I'm wrong. Soneone CAN be that stupid.

    --
    A polar bear is a cartesian bear after a coordinate transform.
    1. Re:Too much faith in humanity? by DeusExLibris · · Score: 1

      Please verify your facts before spreadin FUD. Read the second sentence, second paragraph here. It quite clearly states that FSOs are only available to the domain that originally set them. In other words, it has exactly the same restrictions that cookies and JavaScript have wrt cross domain access.

  46. beware of the tracking on that page too by Anonymous Coward · · Score: 0

    right here from our "friends" at Omniture

    so a visiting a page that you can adjust your privacy settings will actually compromise your privacy,
    now you can see why the privacy GUI is on Macromedia's site and not built into the player, but thats not suprising

    seems Flash is slowly becoming spyware, shame

  47. This is old hat by dzfoo · · Score: 1

    This is not really "New Technology" or a "Paradigm Shift", or anything extraordinary. This is just another marketeer trying to start a "buzz".

    I know plenty of software out there that perform multiple tests in order to establish uniqueness of visitors. Perhaps the current big-biz log-analyzing apps do not do it, but that doesn't mean nothing else does. There was a time when Real Programmers didn't trust cookies as the exclusive identifier. I even remember some popular Log Analyzer Perl script that used to check for the following:

    - First, Cookies
    - Then IP Address (Whether it is known to be dynamic or static)
    - Then compares the IP Addresses by IP pool (ISP)
    - Then checks the time between requests, so that requests of different IPs from the same IP pool, with the same User Agent come in within a pre-determined time, they are considered the same visitor.
    - Also checks the time between requests from the same IP address, so that if a certain pre-determined time has passed between requests, and the IP address is known to be dynamic, and the User Agent changed, then it is probably someone else.

    I do not recall the exact details of the analysis, but it was something along the lines of the above. And there were many scripts like that one.

    Comparing IP addresses geographically and in a time-sensitive manner (coupled with other potentially identifying criteria, such as Cookies, User Agent, Screen resolution, etc.) has been known and done for years! In particular, these forms of unique visitor analysis was very popular during the days when you couldn't count on Cookies being supported by all browsers, or on savvy users accepting them -- you know, when dinosaurs roam the data center; way before everybody decided to rely on Cookies as the end-all-be-all of session identification.

    If they all forgot that using Cookies exclusively was never a a very reliable solution, then that's their problem.

                -dZ.

    --
    Carol vs. Ghost
    ...Can you save Christmas?
  48. Protect yourself from this new menace by cortana · · Score: 1

    Wow, I knew that the Flash settings UI was badly designed, confusing and annoying to use. I didn't know that it set up tracking with partner sites as well.

    Besides, what steps does Flash take to ensure that any old web site can't just reset these permissions, or except itself to the 'no local storage' policy you set?

    Don't bother visiting Macromedia's site at all:

    find / -name libflashplayer.so -o -name libflashplayer.xpt -exec rm {} \;

    If you really can't live without it, try this instead:

    chmod --recursive 500 ~/.macromedia

  49. firebire v1.12 random user agent plugin by cheekyboy · · Score: 1

    That'll screw em up, random user agents.

    --
    Liberty freedom are no1, not dicks in suits.
  50. make that .75 dude - the real answer by cheekyboy · · Score: 1

    10+5/2

    Seriously, whats important REALLY is not the current statitic total for NOW, or TODAY,

    its.... yes... TRENDS!!!

    that PAGE X is increasing by 6% weekly.

    or that page y is dropping in interest.

    Its just like TV ratings, everyone knows its all CRAP and nonsense, except the DELTAS, the changes

    if TV show X is going up 30% week, you know its HOT.

    Think of it like qantum physics, you dont really know the location of the electron, just its DIRECTION. TIME CANNOT STAND STILL.

    DIRECTION of MOTION is what you want which is WHAT PREDICTS the future.

    What good is YESTERDAYS news, we need to know TOMMOROWS NEWS.

    --
    Liberty freedom are no1, not dicks in suits.
  51. Comparative by Jesus+IS+the+Devil · · Score: 1

    While web usage stats may indeed be inaccurate, it is so across the board. This means, everything that relies on it has the same amount of inaccuracy... Which in turn makes it, accurate in the market place.

    For instance, considering everything else to be equal, an ad buyer wanting to pay $1 for one thousand unique eyeballs won't care whether it's spent at site A or site B, as long as they are using equivalent methods to measure traffic.

    Another example. Say Google puts out a press release saying they have X% of web traffic. MSN comes out with Y%. Sure they both may be off, but still the ratio of the two will still be about the same.

    It's like playing chess without the Bishop, except it doesn't just apply to one player, but to every player. There's no advantage for anybody and the net result is probably the same.

    --

    eTrade SUCKS
    1. Re:Comparative by markandrew · · Score: 1

      " While web usage stats may indeed be inaccurate, it is so across the board. This means, everything that relies on it has the same amount of inaccuracy... "

      er, no.

      Stats Package A overesitimates visitors by around 30% and underestimates visits by around 8%. Stats Package B underestimates visitors and vists by around 5%.

      Everything inaccurate != everything is the same

    2. Re:Comparative by Jesus+IS+the+Devil · · Score: 1

      er, no.

      Most stats software out there agree basically with what is measured and what variables to use. To vary in the amount you are talking about, it'd require huge deviations in measuring variables. Again, this comes down to variables, not what is commonly accepted ways of measuring (which the article was referring to).

      Using different variable values, inaccuracies can arise from any system, even with what the article was proposing as ways to measure traffic more accurately.

      What I'm saying is, considering measuring systems (which is mostly true in the market place now) and measured variable values are equivalent, the end result, while still inaccurate, will be comparatively accurate between two sites.

      --

      eTrade SUCKS
  52. Re:make that .75 dude - the real answer by Anonymous Coward · · Score: 0

    YOW! Are you ZIPPY the PINHEAD?

  53. For Firefox by Kamiza+Ikioi · · Score: 1

    http://flashblock.mozdev.org/

    Get it because it'll make you cool like everyone else (Go Go Gadget Peer Pressure!), keep it because you don't miss the ads and just one click brings up any content you do want, as well as whitelist features.

    --
    I8-D
  54. What about patterns in the requests themselves? by baadger · · Score: 1

    If I request page A, then request page B and then go back to page A and grab it with a conditional request (and the server returns 302 not modified) wouldn't this obviously indicate I had been to page A before fairly recently? (assuming you have set cache headers such as to only allow private non-shared ISP proxies to store them)

    What about people following a link with a referer from page A to page C when they haven't (according to your logs) been on page A? Doesn't this likely indicate page A has been cached/saved or is otherwise still open in another tab or window?

    I suspect there is some decent post-processing of HTTP behaviour that could be done on old logs that hasn't been considered or implemented yet.

    If you really want some useful analysis of your website then use some javascript to measure how long it takes users to actively fill out forms, or how quickly they navigate from page to page (one time use information)...you know to maybe actually improve your navigation and make it better for the user?

    Why do people overestimate the importance of knowing exactly how many people are using your site and identifying them? That's a pretty useless practice. When and if the user wants to assert themselves they can register/login (if such a thing is applicable to your site).

  55. Onion Routers? by Anonymous Coward · · Score: 0

    "We know some IP addresses cannot be shared by one person. These are the ones that would require a person to move faster than possible. If we have one IP address in New York, then one in Tokyo 60 minutes later, we know it can't be the same person because you can't get from New York to Tokyo in one hour." - Surely this is flawed if the user is utilizing an Onion router (e.g. http://tor.eff.org/)?

  56. Geography? by grooveFX · · Score: 1

    The author said that if a cookie with an IP address in New York and one in Tokyo an hour later means it is a different person. I know that I have logged in to remote sites with a VPN connection and continue to browse the web. Tracking geography of an IP is not an ideal way to track individual users.

  57. Screw them. by Jesus+2.0 · · Score: 1
  58. Fighting back: by nonlnear · · Score: 1
    My bullshit detector went off the scale when the interview began with:

    First, a bit of background about the concept of a paradigm shift.

    What Dainow is describing is no more a paradigm shift than the first person who put a spare tire in their car. Redundant usage of preexisting systems to increase reliability does not equal a paradigm shift.

    That said, he seems to be on the right track to achieving his goals. (Or rather, the researchers seem to be on the right track.) They seem to be naively optimistic. User metrics is obviously in its infancy if this is the cutting edge. Foiling schemes like this would require a minimum of effort:

    What we need to do is create a browser plugin that actively manages cookies by sharing them. By actively, I mean not just accepting/denying based on a set of rules, but coupling that method with active uploading/downloading of cookies from an alternate source. (Deletion is not the solution!) Maintain a pool of known user-metrics cookies that users can update from rather than from the intended originator. (I'm guessing that P2P distribution would turn out to be ideal for this.) How would the user-metrics companies deal with millions of computers surfing with identical cookies?... I'm betting they honestly might abandon cookie use.

    I can't see that writing a plugin that does this would be very hard. Any takers? (I would if I weren't so damn busy lately.)

    --
    argumentum ad fallacium: Fallacy of defining a fallacy which allows one to dismiss the argument in question.
  59. You misunderstood the problem COMPLETELY by Moraelin · · Score: 1

    You seem to assume that they want to improve their site. In which case, yes, anonymous trends and anonymous user movement grouped by session id suffices. (E.g., to see if users give up and leave your site half-way through the marketting bullshit pages, before even reaching the product pages.)

    But that's not the problem.

    Whenever you see someone going on about how the _need_ to track and identify each user, and they _need_ accurate numbers and even personal details... that's your clue that it's purely an ad money problem. They couldn't care less about the site design or trends as such, they just need some bullshit in numbers to shaft the ad providers with, or viceversa for the ad providers to shaft you with.

    The internet started as a pretty clean and ad-friendly place, but it quickly went downhill. The initial ad rates were basically for sites with 1 ad on the main page, that a lot of people actually looked at. But from there it went downhill with site operators trying to shaft the ad providers. (E.g., by trying to collect those rates per ad... for pages that had wall-to-wall ads on each page, scripts that clicked on an ad 1000 times a second, etc.) The ad providers in turn went on to not only try to defend themselves from this kind of fraud, but also to try to commit the exact kind of fraud on the companies paying to advertise.

    Welcome to the War Of The Bullshit Metrics. Because that's basically what it is.

    All this ranting and raving about unique trackable ids and such, is just the search of the perfect metric so an ad provider can (A) say to the site operator carrying the ads "nah, we're not paying you that much, because while you had users clicking, it wasn't all UNIQUE users", while at the same time (B) telling some clueless company advertising with you "our marketting campain was a huge success because X thousand unique users saw it, and Y thousand clicked on it, and <insert other bogus metrics used often just as a Chewbacca defense>, so you owe us a big wad of money."

    The problem with these bullshit metrics is that not only they mis-represent, but often lead to counter-productive campaigns aimed at inflating the metric even if they cause _less_ sales. E.g., once you define the "click" as a metric of success, as opposed to a "sale", you get bullshit fake-UI ads, punch-the-monkey ads, and outright redirects that simulate a click. That's all stuff that isn't aimed at actually raising awareness/interest in a product, but at gaming a sick irrelevant metric.

    And the problem is that bogus metrics be damned, the companies paying for it aren't seeing results out of it. They just see some metrics by which the ad campaign should have been a huge success, except the expected blip in sale tends to be missing completely.

    The ad provider's solution? Trying to make that bullshit sound more credible. All this talk about uniquely identifying users and accurately counting everything, is mostly trying to make something sound like a science, when it's just bullshit. They want to present something like they have some solid scientific proof that for your X thousand dollars, you're accurately and guaranteed getting Y thousand whatever, within Z% accuracy.

    And it's partially "Chewbacca defense". They deflect the attention from whether their service actually works, to how accurately they measured some bullshit irrelevant metric. And then dazzle your with some equally irrelevant pseudo-science bullshit as to why and how their measurements are so accurate. But again, completely avoiding the question of what it will do for _your_ _product_.

    And the same kind of a war of the bullshit metrics sometimes also goes on inside a company. When marketting departments need to justify their budgets, what do you think they reach for? I'll tell you what. The exact same kind of bullshit metrics that show how accurately and scientiffically they got you X thousand hits. and Y thousand unique tracked users, and whatever.

    So the next time you see such a "the sky is falling if we

    --
    A polar bear is a cartesian bear after a coordinate transform.
    1. Re:You misunderstood the problem COMPLETELY by PlusFiveTroll · · Score: 1

      I wish I had points to mod this up. It's all about the PHB's

  60. How do we know it's accurate? by Infonaut · · Score: 1
    Magdalena and Thomas have run some preliminary tests on three large sites that indicate the number of unique visitors is really around half what existing metrics tell us. Both they and I are anxious to run more detailed tests to validate this methodology.

    So how do you determine if your methodology is accurate? The fact that preliminary tests give you different answers than traditional methods doesn't really tell us anything. It just informs us that two different methods present two different results.

    --
    Read the EFF's Fair Use FAQ
  61. Re:We need to know how many people visit, what the by CmdrGravy · · Score: 1

    A website can only work out how long you are viewing their page for by tracking the time between requests. If you just have a single page open for 18 hours it will be indistinguishable from having a single page open for 0.3 seconds.

  62. This "new paradim" is full of poor assumptions by gerardrj · · Score: 1

    I was debunking the poor logic, inappropriate assumptions and overall lack of fundamental understanding held by these researchers. After debunking the first four points, I changed gears. I'm tired of all these marketing bullshit artists trying to track my every page view and metric on what I do at their site. I'm especially tired of having to manage cookies and delete them on a regular basis. Sure each site only sends 1-10 cookies of a few bytes each, but that starts to add up when you don't stick to your main sites.

    I think we need an open source project that will collaboratively "surf" web sites. The collaboration will mean that site cookies will be tossed in to a publicly accessible pool and shared amongst the workers. The pool might contain thousands of cookies for each site. Workers will get a site and a cookie from the main server and start surfing. The same cookie will be given out to many workers at the same time. My worker might hit NYT now and a machine in New Zealand might hit that same site with the same cookie just a few minutes later. The actual time delay is irrelevant, it's the sharing of the cookies and "co-ordinated attack" that's the key here.

    The workers will of course present random, but valid client IDs to the sites. Some logic also needs to detect that cookie has been used too often and should be discarded. On command a client should access a site with no cookie and instead retrieve a new one. The rate and level of cookie sharing will vary so as to generate noise in any of the standard web metrics algorithms in use.

    I don't think the workers will even need to accept the entire page or images from the page. Log-files from web servers only write information about requests, not about completions. What that suggests is that bandwidth usage will be minimal.

    Maybe if we can generate enough noise these morons will stop trying to come up with more useless ways to invade our privacy and track our every on-line move. Once the advertisers start seeing their pay-outs go through the roof they may ask questions about what's going on.

    I'm no privacy nut, but I wouldn't stand for being tracked as I walk around all day; I've no great desire to accept it as I "walk" around the 'net either.

    --
    Article X: The powers not delegated... by the Constitution...are reserved...to the people
  63. Then it was managements fault.. by msimm · · Score: 1

    Otherwise your production app would have used some kind of persistency and your job would have been a lot nicer. Thats what I would have explained to them anyway. Stupid as an excuse can only really go so far and even managers in such technical fields should be able to get that.

    --
    Quack, quack.
  64. SSL Session ID by Anonymous Coward · · Score: 0

    This guy misses one more way to identify users: make your web site using SSL and use SSL Session ID for identifications. Most web servers (for example tomcat and apache) can do this.

    1. Re:SSL Session ID by ebrandsberg · · Score: 1

      Yes, and you will get a new user every 2 minutes with IE, as it doesn't preserve the ssl session ID for long. This is even less reliable than using IP's to track users.

  65. Missing behavior by ebrandsberg · · Score: 1

    Many times what is as important isn't the existance of a piece of information, but the lack of a piece of information. If a particular object is referenced and that object is flagged to be cacheable at a browser (cache-control: private), and the reference wasn't an if-modified since request, then you could consider it a new visit. If however, a user references the page the object is imbedded in, but the object itself isn't referenced, then it is cached, and could be considered a return visit. This would work about as effectively as anything this report was talking about, and is something that as an individual method won't require a huge amount of processing to do the math on.

  66. Golden Question: by AntiCopyrightRadical · · Score: 1

    Golden Question:

    Is this Method Patented?

    I'd bet that it is, or will be soon, and this is just an ad to get someone to license their 'Method', or worse, get someone to implement it without knowing it's patented, so the lawyers can do their magic.

    --
    Abolish Copyright. Restore Freedom.
  67. Crapola. by asackett · · Score: 1

    If you want to measure the success of your web site, look at the net income it generates. If you want to identify problem areas, use the available data intelligently, with full understanding of its limitations, and perform a well reasoned statistical analysis of that data.

    The only thing gained by uniquely identifying users outside of financial transactions is the opportunity to violate their privacy.

    I defy the "new" methodology to uniquely identify me on jrandomwebsite.com -- I block cookies until I know they're essential to support my own selfish desires, I block HTTP_REFERER, and I use onion routing. I choose not to participate in any corporate greed in which I do not have an interest. Count this, sucker.

    --

    Warning: This signature may offend some viewers.

  68. Trusted HTTP and Trusted Network Connect by tepples · · Score: 1

    I wonder what they will think when they start getting impossible bit patterns, like 7734 and 6027734 and 5773857734?

    If a site requires Trusted HTTP Extensions, you will get a 403 error instead of a page. If your ISP requires Trusted Network Connect, you will get a DHCP failure instead of an IP address. Alsee predicts that dystopia will arrive by 2015 unless we drum up a significant backlash among residential Internet users.

  69. Web accelerator by tepples · · Score: 1

    The majority of users behind "web accelerator" proxies do not know how to delete cookies. Or if you're really anal you could impose "free reg. req." on all users behind such proxies.