Slashdot Mirror


Your 'Clickprint' Gives Away Your Identity Online

Krishna Dagli writes to mention an article at the Guardian site about an increasing interest in the possibility of identifying users by their 'clickprint', or online access habits. The article discusses a new paper on online identification written by two American professors. The piece posits that not only is nailing down individual users by their habits useful for advertisers looking to sell products, it may be possible to use this information to flag stolen identities. From the article: "'Our main finding is that even trivial features in an internet session can distinguish users,' Padmanabhan told the Wharton Review. 'People do seem to have individual browsing behaviors.' The duo found that anywhere from three to 16 sessions are needed to identify an individual's clickprint ... In one example, they found that from just seven aggregated sessions they could distinguish between two different surfers with a confidence of 86.7%. Given 51 sessions, the confidence level rose to 99.4%."

8 of 76 comments (clear)

  1. Shameless Weka Plug by eldavojohn · · Score: 5, Interesting
    So I can already anticipate people being concerned about their identities being tracked through clicks online.

    You don't have to worry about this, however, as it is easy to distinguish two different users but probably difficult to pick you out of a crowd. Furthermore, if they're tracking your clicks, they probably already know your IP address. The number of sessions probably raises to a problematic number if you are trying to identify one user out of one thousand. Therefore, this will only be useful in identifying different behavior between two users -- or specifically identifying when it is highly likely that someone who is logged in is significantly different from the click profile associated with that account (as the article states).

    There's a lot of discussion about this in the paper. Mentioning that the priors are set at 50% for 2 users but at 1% for 100 users (obviously). And also that:
    In an experiment involving 42 user profiles, Monrose and Rubin (1997) shows that depending on the classifier used, between 80 to 90 percent of users can be automatically recognized using features such as the latency between keystrokes and the length of time different keys are pressed.
    They go on to say that the method they suggest for detecting a fradulent user "do not require that users have truly unique profiles."

    I read a bit of the paper and I identified Weka's decision tree method being used to classify the users (if you've ever used the ID3 algorithm or its brethren C4.5 in classification, imagine exploring methods of developing different decision trees).

    Indeed the paper states:
    We chose weka's J4.8 as the classifier since classification trees in general have been shown to be highly accurate classifiers.
    I'll take this opportunity to recommend two open source projects. Torpark for those of you concerned about your identity and also Weka -- the easy to use collection of data mining software in Java! Also something to note is that Weka has recently become part of Pentaho, a project of open source business intelligence products. Explore the valuable tools that are out there and enjoy!
    --
    My work here is dung.
    1. Re:Shameless Weka Plug by Lord_Dweomer · · Score: 3, Interesting
      Since you seem to be knowledgeable on this topic...I have a question for you.

      If they're talking about using this for identifying fraudulent users...how much would changing news/services on the internet affect that? I can think of several news items and new services that instantaneously and permanently caused me to alter my browsing and internet using habits. Wouldn't those sorts of behavior altering agents increase false positives?

      Please bear in mind I have absolutely zero background in this kind of stuff ;)

      --
      Buy Steampunk Clothing Online!
  2. The only two guys on the internets by Anonymous Coward · · Score: 3, Funny

    Great! Finally we'll be able to distinguish between the two guys who use the Internets... most of the time.

  3. Oh No You Didn't by eldavojohn · · Score: 4, Funny
    But I'm used to living among dyslexics, illiterates, and dumbasses. Sigh.
    Go kcuf yourslef! I am not living among you! I may be dyslexic, I may be illiterates and I may be a dumbass but I am definitely not a sigh.
    --
    My work here is dung.
  4. Re:How about this? by recordMyRides · · Score: 5, Funny

    Don't worry, I predict that a porn-related application will need to be invented in order for this to enter widespread use.

  5. Answer to Your Question by eldavojohn · · Score: 3, Informative
    If they're talking about using this for identifying fraudulent users...how much would changing news/services on the internet affect that? I can think of several news items and new services that instantaneously and permanently caused me to alter my browsing and internet using habits. Wouldn't those sorts of behavior altering agents increase false positives?
    To the best of my knowledge, the idea is that you wouldn't change drastically. And if you did, it might falsely accuse you of being a fraudulent user and then you mearly need to straighten things out.

    The odds are low and this is a variable to be tweaked. But the assumption is that you will still visit your old sites and exhibit your behaviors on them. If you found say one new site a week, it would actually slowly be incorporated into your routine (if they used regression properly and allowed the model to train on your data -- old and new). But if you suddenly stopped going to your old sites and started visiting new ones, you would probably be flagged. And that's the trade off of trying to repress fraud.

    I should point out that there's a lot of play with the variables here and that actual implementation of this theoretical paper could be either well done or badly done.

    Excellent point, though. Sometimes these new technologies turn out to be more cumbersome than helpful and we need to watch out for that!
    --
    My work here is dung.
  6. Am I the only one by TubeSteak · · Score: 5, Insightful

    Who doesn't like clicking on Tiny Urls?

    Tiny Urls just don't compute as part of my safe surfing habits.
    Example:
    Tiny Url --> my redirect --> paper
    After it hits the front page
    Tiny Url --> my redirect --> 0-day exploit

    There really is no need for them in Slashdot Submissions.

    Here's the direct link to the paper
    http://knowledge.wharton.upenn.edu/papers/1323.pdf

    --
    [Fuck Beta]
    o0t!
  7. an even simpler solution... by cyberworm · · Score: 4, Funny

    Follow them to their myspace page.