Slashdot Mirror


London's Public Bike Data Can Tell Everyone Where You've Been

An anonymous reader writes "I recently posted this article with a few vizualizations and a bit of analysis about the risks associated with open data sets. Thought it might be of interest of Slashdot readers: 'This article is about a publicly available dataset of bicycle journey data that contains enough information to track the movements of individual cyclists across London, for a six month period just over a year ago.'"

9 of 41 comments (clear)

  1. Contradiction by jklovanc · · Score: 3, Informative

    From the article;

    and with a little effort, it's possible to find the actual people who have made the journeys.

    because (thankfully) it requires a fair bit of effort to actually identify individuals from the data

    Is it "a little effort" or " a fair bit of effort"? The never go into what would need to be done to get the identity information.

  2. Re:Seems ridiculously easy by jklovanc · · Score: 2

    Only if the stalker can link the map to a person. The article says it is possible but not how to do it. The maps are called up but a user id not a name.

  3. Re:Seems ridiculously easy by Anon,+Not+Coward+D · · Score: 2

    From point A (your house) to point B (your work) there are only a handful of individuals (or only one)... so filtering with some other information (such as trips from point A to C, close to his/her parents) it can be linked to a single individual quite easy.

    --
    Sometimes it's better not having signature
  4. Re:Seems ridiculously easy by JeffAtl · · Score: 2

    The author mentioned cross-referencing the time/location data points with social media posts.

    If person already knew a particular time/location for the biker, that could be used to figure out the customer id.

  5. It is easy enough to identify someone by Anonymous Coward · · Score: 2, Insightful

    It's easy enough to identify someone if you are determined - all you need to do is be present at the bike station at the set time and follow the user home. You may get a few false positives but once you identify the correct person, you can track their movements forever in the future. So it is not difficult.

    However, providing customer level data has lot more benefits - from road/bike route planning to planning where to put my shop that sells bike parts or on the go coffees in a special non spill cup, tailored to bike riders, or even figure out the traffic situation in a city at different times. Unfortunately it is not possible to strike the right balance easily.

  6. Re:Seems ridiculously easy by Dazza · · Score: 3, Informative

    It's not that simple.

    You can't track from 'your house' to 'your work'. The tracking data is for London's bike hire scheme. These are picked up from specific 'docking points' around the city, and are returned to any docking point.

    So you can only get 'station to station' data.

    --
    -- "I know that this is vitriol, no solution, spleen-venting, but I feel better having screamed, don't you ?"
  7. Incorrect by rjstanford · · Score: 3, Insightful

    This article is about a publicly available dataset of bicycle journey data that contains enough information to track the movements of individual cyclists across London

    From TFA: "What may surprise you is that this record includes unique customer identifiers, as well as the location and date/time for the start and end of each journey."

    The unique ID? Yeah, maybe that's a problem, likely not that big a deal but also easy enough to get rid of (although if we do that, we lose the ability to track joined journeys, identify frequent vs. infrequent users, &c. But that's not the point here.

    Identifying which bike stations you check a bike out from and return a bike to is very different from identifying your movements across London. Very different indeed. I'd argue that you do have an expectation of privacy when you stop along the way to get a cup of coffee, a bit of nookie, or a gyro. As a public transportation user, though, your checkin and checkout actions are totally different than your route.

    In fact, it'd be damned useful to be able to see and show that you did - or did not - retrieve or return a bike at a particular place and time. Its also useful to be able to tell where that bike went in the future.

    Think about library books. Even in the "olden days," it was frequently possible to see who checked out a book, when they got it, and when they returned it. You couldn't, however, tell whether or not they liked it, if they read it in the bathtub, or if they let their SO read a page or two along the way.

    Same here, just with bikes. Sorry guys, no news.

    --
    You're special forces then? That's great! I just love your olympics!
  8. Re:Seems ridiculously easy by TheCarp · · Score: 2, Interesting

    Except said stalker has a different problem set than the article's author. The author is looking at the data, and picking out an individual. It is a whole different problem to take an individual, that you have some information about, and pick them out.

    So maybe the stalker is looking at an employee of some establishment. He watches when that employee comes in for a few days. Lots of people use the same bike terminal, but how many individuals checked in at 8 am today, 8:03 yesterday, 7:58 the day before?

    Before he may have had to follow his prey home, case them through social engagements... now, collect data in the same place every day for a few days, and he has a literal map of their life; all with no danger of exposing himself.

    This is far too easy to abuse, and a danger to too many people. It could be used to kidnap children of rich people, it could be used to rob drug dealers, it could be used to track women back to their homes to rape, it could be used to ambush ex-lovers or their new spouses.

    Frankly, it is actually putting people in danger in a way that is especially enormously terrible since it would be so easy to avoid. Why would you EVER publish unique identifiers that map to people like that? I can understand this was probably an oversight, but it really is indefensible as an intentional disclosure.

    --
    "I opened my eyes, and everything went dark again"
  9. Um, what? by Anonymous Coward · · Score: 2, Informative

    The dataset contains a Bike ID, not a customer ID. You can get the dataset yourself and look.

    http://www.tfl.gov.uk/info-for/open-data-users/our-feeds#on-this-page-4

      And one of the suggestions they have is to map the bike journeys.

    There is a similar set for the underground.