Slashdot Mirror


Your 'Clickprint' Gives Away Your Identity Online

Krishna Dagli writes to mention an article at the Guardian site about an increasing interest in the possibility of identifying users by their 'clickprint', or online access habits. The article discusses a new paper on online identification written by two American professors. The piece posits that not only is nailing down individual users by their habits useful for advertisers looking to sell products, it may be possible to use this information to flag stolen identities. From the article: "'Our main finding is that even trivial features in an internet session can distinguish users,' Padmanabhan told the Wharton Review. 'People do seem to have individual browsing behaviors.' The duo found that anywhere from three to 16 sessions are needed to identify an individual's clickprint ... In one example, they found that from just seven aggregated sessions they could distinguish between two different surfers with a confidence of 86.7%. Given 51 sessions, the confidence level rose to 99.4%."

17 of 76 comments (clear)

  1. Shameless Weka Plug by eldavojohn · · Score: 5, Interesting
    So I can already anticipate people being concerned about their identities being tracked through clicks online.

    You don't have to worry about this, however, as it is easy to distinguish two different users but probably difficult to pick you out of a crowd. Furthermore, if they're tracking your clicks, they probably already know your IP address. The number of sessions probably raises to a problematic number if you are trying to identify one user out of one thousand. Therefore, this will only be useful in identifying different behavior between two users -- or specifically identifying when it is highly likely that someone who is logged in is significantly different from the click profile associated with that account (as the article states).

    There's a lot of discussion about this in the paper. Mentioning that the priors are set at 50% for 2 users but at 1% for 100 users (obviously). And also that:
    In an experiment involving 42 user profiles, Monrose and Rubin (1997) shows that depending on the classifier used, between 80 to 90 percent of users can be automatically recognized using features such as the latency between keystrokes and the length of time different keys are pressed.
    They go on to say that the method they suggest for detecting a fradulent user "do not require that users have truly unique profiles."

    I read a bit of the paper and I identified Weka's decision tree method being used to classify the users (if you've ever used the ID3 algorithm or its brethren C4.5 in classification, imagine exploring methods of developing different decision trees).

    Indeed the paper states:
    We chose weka's J4.8 as the classifier since classification trees in general have been shown to be highly accurate classifiers.
    I'll take this opportunity to recommend two open source projects. Torpark for those of you concerned about your identity and also Weka -- the easy to use collection of data mining software in Java! Also something to note is that Weka has recently become part of Pentaho, a project of open source business intelligence products. Explore the valuable tools that are out there and enjoy!
    --
    My work here is dung.
    1. Re:Shameless Weka Plug by balsy2001 · · Score: 2, Interesting

      "They go on to say that the method they suggest for detecting a fradulent user "do not require that users have truly unique profiles." " This could be problematic for two individuals who use the same account. For example, my wife and I use the same account for some financials but we have drastically different habits and paterns while using the computer.

      --
      GENERATION 27: The first time you see this, copy it into your sig on any forum and add 1 to the generation.
    2. Re:Shameless Weka Plug by Lord_Dweomer · · Score: 3, Interesting
      Since you seem to be knowledgeable on this topic...I have a question for you.

      If they're talking about using this for identifying fraudulent users...how much would changing news/services on the internet affect that? I can think of several news items and new services that instantaneously and permanently caused me to alter my browsing and internet using habits. Wouldn't those sorts of behavior altering agents increase false positives?

      Please bear in mind I have absolutely zero background in this kind of stuff ;)

      --
      Buy Steampunk Clothing Online!
  2. How about this? by Conspiracy_Of_Doves · · Score: 2, Interesting

    How about a program that sits in the background and randomly hits sites while you are browsing?

    1. Re:How about this? by recordMyRides · · Score: 5, Funny

      Don't worry, I predict that a porn-related application will need to be invented in order for this to enter widespread use.

  3. The only two guys on the internets by Anonymous Coward · · Score: 3, Funny

    Great! Finally we'll be able to distinguish between the two guys who use the Internets... most of the time.

  4. My "clickprint" is easy on slashdot... by sm62704 · · Score: 2, Funny

    I'm the guy who can read; I get the "slow down cowboy" message constantly.

    But I'm used to living among dyslexics, illiterates, and dumbasses. Sigh.

    --
    mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
  5. Oh No You Didn't by eldavojohn · · Score: 4, Funny
    But I'm used to living among dyslexics, illiterates, and dumbasses. Sigh.
    Go kcuf yourslef! I am not living among you! I may be dyslexic, I may be illiterates and I may be a dumbass but I am definitely not a sigh.
    --
    My work here is dung.
  6. Potentially useless.. by Vellmont · · Score: 2, Interesting

    I haven't read the full paper, but the article makes this sound extremely preliminary as a usefull tool. It says they can distinguish between two users with 99% accuracy. That's all well and good when you only need to distinguish between two people, but what about when you need to distinguish between a million people?

    I can distinguish between a person with blond hair and a person with brown hair given only the hair color 100% of the time. But that doesn't mean hair color is something that's a very usefull tool at positively identifying people. The key is how different peoples "click profiles" are. If there's only 1000 different possibilities (evenly distributed) that's not terribly good at idenfification. If there's 10^10 possible profiles, evenly distributed among the populace, that would certainly be usefull. Also, what's the false positive rate? If you try to use this at identifying fraud and you have a 1% false positive rate, you'll end up pissing off 1% of your customers. That's probbably not acceptable.

    --
    AccountKiller
  7. Defense by Led+Nudd · · Score: 2, Interesting

    How about a Firefox extension that, at random time intervals, randomly requests one of the page links? It wouldn't have to even load the page in a tab. That might introduce enough noise to cover a "clickprint." (Implementation is left as an exercise for the reader.)

  8. Similar to ssh exploit a few weeks back by Yahma · · Score: 2, Informative

    This is similar to the SSH exploit reported here on Slashdot a few weeks back where data could be determined via statistical/timing analysis done on the packets sent during an SSH session.

    It sounds like if these types of timing and statistical analysis attacks become common, a simple solution would be a firefox extension that would randomize the timing of the input from the mouse and the keyboard. I suspect that randomly delaying a keystroke or a mouse click anywhere between (0-100ms) would be enough to defeat this type of analysis as well as short enough as to not adversely affect the browsing experience.

    Of course browsing browsing the web through a good anonymous web proxy will probably do alot more to hide your identity than any type of randomizing of your input strokes.. but then, utilizing both methods as well as encryption would make things all the harder for any attacker.

    Yahma
  9. Answer to Your Question by eldavojohn · · Score: 3, Informative
    If they're talking about using this for identifying fraudulent users...how much would changing news/services on the internet affect that? I can think of several news items and new services that instantaneously and permanently caused me to alter my browsing and internet using habits. Wouldn't those sorts of behavior altering agents increase false positives?
    To the best of my knowledge, the idea is that you wouldn't change drastically. And if you did, it might falsely accuse you of being a fraudulent user and then you mearly need to straighten things out.

    The odds are low and this is a variable to be tweaked. But the assumption is that you will still visit your old sites and exhibit your behaviors on them. If you found say one new site a week, it would actually slowly be incorporated into your routine (if they used regression properly and allowed the model to train on your data -- old and new). But if you suddenly stopped going to your old sites and started visiting new ones, you would probably be flagged. And that's the trade off of trying to repress fraud.

    I should point out that there's a lot of play with the variables here and that actual implementation of this theoretical paper could be either well done or badly done.

    Excellent point, though. Sometimes these new technologies turn out to be more cumbersome than helpful and we need to watch out for that!
    --
    My work here is dung.
    1. Re:Answer to Your Question by Lord_Dweomer · · Score: 2, Insightful
      And if you did, it might falsely accuse you of being a fraudulent user and then you mearly need to straighten things out.

      Because we all know that the process of straightening things out when you've been flagged as a fraudster is always a quick and easy process that works 100% of the time.

      Thanks for answering my question though!

      --
      Buy Steampunk Clothing Online!
  10. Am I the only one by TubeSteak · · Score: 5, Insightful

    Who doesn't like clicking on Tiny Urls?

    Tiny Urls just don't compute as part of my safe surfing habits.
    Example:
    Tiny Url --> my redirect --> paper
    After it hits the front page
    Tiny Url --> my redirect --> 0-day exploit

    There really is no need for them in Slashdot Submissions.

    Here's the direct link to the paper
    http://knowledge.wharton.upenn.edu/papers/1323.pdf

    --
    [Fuck Beta]
    o0t!
    1. Re:Am I the only one by gladed · · Score: 2, Informative

      I agree. If you are concerned about this, TinyURL allows you to enable "previews" now. When enabled, clicking on a tinyurl link will direct you to a page that shows you the link, where you can decide to click or not. See http://tinyurl.com/preview.php.

  11. an even simpler solution... by cyberworm · · Score: 4, Funny

    Follow them to their myspace page.

  12. Pentaho? by OldManAndTheC++ · · Score: 2, Interesting

    What is that, a five-sided prostitute??

    --
    Soylent Green is peoplicious!