Slashdot Mirror


Your 'Clickprint' Gives Away Your Identity Online

Krishna Dagli writes to mention an article at the Guardian site about an increasing interest in the possibility of identifying users by their 'clickprint', or online access habits. The article discusses a new paper on online identification written by two American professors. The piece posits that not only is nailing down individual users by their habits useful for advertisers looking to sell products, it may be possible to use this information to flag stolen identities. From the article: "'Our main finding is that even trivial features in an internet session can distinguish users,' Padmanabhan told the Wharton Review. 'People do seem to have individual browsing behaviors.' The duo found that anywhere from three to 16 sessions are needed to identify an individual's clickprint ... In one example, they found that from just seven aggregated sessions they could distinguish between two different surfers with a confidence of 86.7%. Given 51 sessions, the confidence level rose to 99.4%."

7 of 76 comments (clear)

  1. Shameless Weka Plug by eldavojohn · · Score: 5, Interesting
    So I can already anticipate people being concerned about their identities being tracked through clicks online.

    You don't have to worry about this, however, as it is easy to distinguish two different users but probably difficult to pick you out of a crowd. Furthermore, if they're tracking your clicks, they probably already know your IP address. The number of sessions probably raises to a problematic number if you are trying to identify one user out of one thousand. Therefore, this will only be useful in identifying different behavior between two users -- or specifically identifying when it is highly likely that someone who is logged in is significantly different from the click profile associated with that account (as the article states).

    There's a lot of discussion about this in the paper. Mentioning that the priors are set at 50% for 2 users but at 1% for 100 users (obviously). And also that:
    In an experiment involving 42 user profiles, Monrose and Rubin (1997) shows that depending on the classifier used, between 80 to 90 percent of users can be automatically recognized using features such as the latency between keystrokes and the length of time different keys are pressed.
    They go on to say that the method they suggest for detecting a fradulent user "do not require that users have truly unique profiles."

    I read a bit of the paper and I identified Weka's decision tree method being used to classify the users (if you've ever used the ID3 algorithm or its brethren C4.5 in classification, imagine exploring methods of developing different decision trees).

    Indeed the paper states:
    We chose weka's J4.8 as the classifier since classification trees in general have been shown to be highly accurate classifiers.
    I'll take this opportunity to recommend two open source projects. Torpark for those of you concerned about your identity and also Weka -- the easy to use collection of data mining software in Java! Also something to note is that Weka has recently become part of Pentaho, a project of open source business intelligence products. Explore the valuable tools that are out there and enjoy!
    --
    My work here is dung.
    1. Re:Shameless Weka Plug by balsy2001 · · Score: 2, Interesting

      "They go on to say that the method they suggest for detecting a fradulent user "do not require that users have truly unique profiles." " This could be problematic for two individuals who use the same account. For example, my wife and I use the same account for some financials but we have drastically different habits and paterns while using the computer.

      --
      GENERATION 27: The first time you see this, copy it into your sig on any forum and add 1 to the generation.
    2. Re:Shameless Weka Plug by Lord_Dweomer · · Score: 3, Interesting
      Since you seem to be knowledgeable on this topic...I have a question for you.

      If they're talking about using this for identifying fraudulent users...how much would changing news/services on the internet affect that? I can think of several news items and new services that instantaneously and permanently caused me to alter my browsing and internet using habits. Wouldn't those sorts of behavior altering agents increase false positives?

      Please bear in mind I have absolutely zero background in this kind of stuff ;)

      --
      Buy Steampunk Clothing Online!
  2. How about this? by Conspiracy_Of_Doves · · Score: 2, Interesting

    How about a program that sits in the background and randomly hits sites while you are browsing?

  3. Potentially useless.. by Vellmont · · Score: 2, Interesting

    I haven't read the full paper, but the article makes this sound extremely preliminary as a usefull tool. It says they can distinguish between two users with 99% accuracy. That's all well and good when you only need to distinguish between two people, but what about when you need to distinguish between a million people?

    I can distinguish between a person with blond hair and a person with brown hair given only the hair color 100% of the time. But that doesn't mean hair color is something that's a very usefull tool at positively identifying people. The key is how different peoples "click profiles" are. If there's only 1000 different possibilities (evenly distributed) that's not terribly good at idenfification. If there's 10^10 possible profiles, evenly distributed among the populace, that would certainly be usefull. Also, what's the false positive rate? If you try to use this at identifying fraud and you have a 1% false positive rate, you'll end up pissing off 1% of your customers. That's probbably not acceptable.

    --
    AccountKiller
  4. Defense by Led+Nudd · · Score: 2, Interesting

    How about a Firefox extension that, at random time intervals, randomly requests one of the page links? It wouldn't have to even load the page in a tab. That might introduce enough noise to cover a "clickprint." (Implementation is left as an exercise for the reader.)

  5. Pentaho? by OldManAndTheC++ · · Score: 2, Interesting

    What is that, a five-sided prostitute??

    --
    Soylent Green is peoplicious!