Slashdot Mirror


Will London Monetize Wifi Tracking Data From Its Tube Passengers? (gizmodo.co.uk)

New questions are arising about how much privacy you'll have on London's underground trains. "For a month at the end of last year, Wi-fi signals were used to track passenger journeys across the network," writes Gizmodo. "The idea is that as we travel across the Tube network, Wi-fi beacons in stations would detect the unique ID -- the MAC address -- of our phones, tablets and other devices -- even if we're not connected to the Tube's wifi network." The only way to opt-out is to turn off your phone's Wi-Fi. An anonymous reader writes: London is struggling with the transport network capacity so the ability to learn commuters' travel patterns is compelling... Now it emerged that TfL, the operator of London Subway system, is planning to use the system to monetize passengers' data. TfL is also not ruling out sharing the data with third-parties in future.

More information shows that the privacy protection could not be as good as TfL maintains, with reversible hashing and options of giving data to law enforcement. A privacy engineering expert points out additional issues in pseudonymisation scheme and communication inconsistencies. Final deployment has been initially scheduled to start in end of 2017.

"Once the tools are in place, there will inevitably be a temptation to make use of them," warns Engadget, raising the possibility of the data's use for advertising -- or even the availability to law enforcement of location data for every passenger.

17 of 90 comments (clear)

  1. Randomize Wifi MAC ? by dam.capsule.org · · Score: 2

    With 48bits and the number of people connected at one point to a wifi AP, wouldn't it be possible to randomize the MAC address ? Even with a thousand connected people, which I think could never occur, the rate of collision would be less then 1 in a hundred billion. I think nowadays most chips allows changing the mac, but I'm not sure about wireless mobile chipsets.

    --
    What sig ?
    1. Re:Randomize Wifi MAC ? by jargonburn · · Score: 3, Informative

      Unless things have changed since I last read up on this issue, there are two basic problems with using randomization of MAC addresses to defeat tracking:

      --Software Implementation--
      Lazy method of randomization. Sometimes as simple as incrementing the value of the MAC address by 1, repeatedly over time.
      There are other signatures transmitted besides the MAC address that make it trivial to identify most smartphones, especially given the previous point.

      --Hardware Implementation--
      Smartphone chipsets handle low-level control frames in a manner that is vulnerable to tracking. As in 100% success rate. IIRC, this will happen even if you have the WiFi off in software or are in Airplane Mode.


      Source

    2. Re:Randomize Wifi MAC ? by AmiMoJo · · Score: 4, Interesting

      Why would the chipset handle wifi packets when the wifi receiver is turned off? And even if it did, with the transmitter turned off how would the tracker ever know that it did? There is no energy going to the transmitter, no energy radiated.

      Perhaps you are referring to some Apple devices where the off switch doesn't actually turn the wifi off, but most devices don't have that fault.

      There used to be an issue where devices would broadcast the SSIDs of networks they knew about. That was to handle networks that didn't broadcast an SSID themselves, but it's mostly been deprecated and was one of the reasons that MAC address randomization was introduced.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
  2. Overcomplicating matters by andrewbaldwin · · Score: 5, Insightful

    I can sympathise with TfL's stated aims - knowing how many people go from place A to place B via route C at certain times of day is useful and can be socially beneficial if it helps train scheduling.

    But this can be done in a simpler way (albeit not in real time - but is that really necessary?).

    Many years ago I recall using the metro and local trains in Copenhagen when they were doing a survey. When you entered the station they gave you a paper slip with the station name and timeslot written on it; when you reached your end destination there was a bin to drop the paper slip into. That's it from the passenger viewpoint - minimal inconvenience and no linking to you as a person (and you could even opt out by keeping the paper slip if you were so minded).

    I'm guessing that at the end of the day they collected the slips at each station and could work out just how many people went on each journey within hour long blocks.

      I do recall thinking that a bar code or QR block would simplify the counting process.

    But that's not cool enough - it's too simple for today's management to consider (and it cannot be subverted or surveilled).

    Slightly off topic - doesn't everyone turn off the phone wifi & bluetooth when not in use? -- doing so seems [in my experience -YMMV] to extend the time between charges by quite a useful margin.

    1. Re:Overcomplicating matters by oobayly · · Score: 5, Insightful

      The reason they did this was to track people's routes through the system - Oyster will only give the end points, not where they changed stations. The Gizmodo article explains that, if you bothered to read it...

      The Register did an article on this a few weeks ago and mentions that TFL did a good job anonymising the data:

      Fortunately, TfL did it right: they used ICO guidelines to protect users' privacy by grabbing and tracking MAC addresses and then depersonalized them using a salt which then discarded at the end of each day. That in effect makes it impossible to know what the original MAC address was.

    2. Re:Overcomplicating matters by michelcolman · · Score: 3, Informative

      Slightly off topic - doesn't everyone turn off the phone wifi & bluetooth when not in use?

      We do, but Apple just turns it on again when we travel to a new location or in any case at 5am.

      (unless we go out of our way to disable it in the system settings rather than through the more convenient control center which tricks us into thinking it's the same thing)

    3. Re:Overcomplicating matters by michelcolman · · Score: 2

      And then pretty soon they'll get some more great ideas, like: "are the people traveling on Wednesday the same ones as those traveling on Thursday?". So they'll stop discarding the salt and there goes your anonymity.

    4. Re:Overcomplicating matters by AmiMoJo · · Score: 4, Interesting

      Maybe not... https://blog.lukaszolejnik.com...

      Aside from TfL's apparent confusion of various technical terms, it looks likely that the salts could be recovered. MAC addresses are not random, they are assigned in blocks to manufacturers. Some devices do randomize them, but some don't and it appears that they use only one salt per day for every MAC address they hash.

      You can assume that there will be a large number of devices running wifi chipset X and not randomizing. That gives you a way to check a salt for validity, i.e. if when combined with known MAC addresses from the ranges allocated to that manufacturer it produces a hash in the TfL dataset. And you can further narrow this down by taking your own device with a known MAC address onto the tube during the test.

      It's probably fine... But their lack of technical clarity and secrecy about the scheme they used (for all we know the salts could have just been the date or something silly) isn't very encouraging. As a branch of government they should set the gold standard for this stuff.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
  3. Sigh by ledow · · Score: 3, Insightful

    Paranoia much?

    Pretty much if you're on a train (especially a Tube train) then you bought a ticket from A to B or - in London - you bought an Oyster card which records your every journey as you have to tap-in and tap-out.

    This is quite normal for any train/subway system. What information do you think they are going to glean from Wifi that they can't glean in this manner about travel patterns? Only what you give them, and only of little use (does it REALLY matter that the guy going from Embankment to Mile End did a DNS lookup for slashdot.org, and how on earth would you ever properly correlate that if he only quickly checks a website at stations he never alights at, and then turns Wifi off?).

    This is the "machine learning" rubbish all over again. Masses of data, lots of processing, no more insight into anything useful over and above monitoring ticket sales which you have to do anyway.

    1. Re:Sigh by geekmux · · Score: 2

      Paranoia much?

      Pretty much if you're on a train (especially a Tube train) then you bought a ticket from A to B or - in London - you bought an Oyster card which records your every journey as you have to tap-in and tap-out.

      This is quite normal for any train/subway system. What information do you think they are going to glean from Wifi that they can't glean in this manner about travel patterns?

      Travel patterns are not the gold mine here. Browsing habits are.

      Gathering browsing habits of people who spend XX hours every week in the tube is worth more than you know. Putting ads in front of your eyes for that entire trip is valuable to a lot of companies, since they know you spend 95% of that time staring at a phone screen.

    2. Re:Sigh by ledow · · Score: 2

      Are you seriously suggesting the TfL, the people IN CHARGE OF THE TUBE NETWORK, can't come up with a number for how busy stations are at certain times of the day, but think that Wifi numbers (which by far do not represent actual passenger numbers) will help them do that?

      Really? I mean, I knew they were incompetent, but that would just be staggering.

      The control rooms can see cameras of almost every platform on almost every Tube station. They show it off when they do those documentaries where they cry about how little their drivers earn compared to millionaires and Premier League football players.

      If you NEED to know the exact route every person took, to that level of detail, to know that Willesden Junction gets busy, then you really shouldn't be running a transport network. And, guess what. Those "bad connections"... yeah, that's when the train runs late which shows up on a big electronic sign on every station on the route. Gosh, I wonder how they could obtain that information....

  4. What good is the data? by mveloso · · Score: 2

    Really, once the data is anonymized it becomes useless to advertisers. So the fears here are pretty overblown.

    1. Re:What good is the data? by currently_awake · · Score: 2

      Your comment translates as: If they anonymize the data they can't make money selling it.

  5. Huh? WiFi? by thegarbz · · Score: 2

    The London tube and public transport in general was an early adopter of electronic ticketing. What purpose could they have tracking passenger's via MAC address when they can already track them via Oyster card? What are they hoping to achieve via this? Evidence that people are walking down the tunnels?

    It would seem that if you know where a person gets on, gets off, and where your carriages are a simple bit of data analytics could get them the same information.

    1. Re:Huh? WiFi? by Richard_at_work · · Score: 2

      The London Underground is a mass of interconnecting lines, and you can literally enter into the system at 7am and exit at 7pm, having travelled the entire network without exiting the system once - the point of capturing this data is not to see where they get on and get off, its to see what routings they take between those points - that is a wealth of data TfL can use to improve the service.

    2. Re:Huh? WiFi? by shortscruffydave · · Score: 2

      For the same journey on the tube, there are often several different routes. One of my regular journeys could be traversed over three practical routes (plus countless stupidly long ones). My preferred option isn't actually the quickest - it's about 2 or 3 minutes slower than the obvious/optimum route - but I choose it because during the summer the trains on that particular line are cooler

    3. Re:Huh? WiFi? by Obfuscant · · Score: 2

      I mean if the purpose for this is improving the flow of commuters then you'd focus on the shortest time and scheduled path between any two stations.

      But that might not be what the people are doing. At all.

      For example, during my recent vacation in Munich I would often enter the system at Marienplatz, ride to Karlzplatz Stachus or Hauptbahnhof, then ride back out to Isartor. For those who don't know the system, that's getting on in the center of the city, going west, then going back east. I did that almost every day. Now, Munich does not track riders by ticket because you don't need to show anyone or any machine a ticket. At most you stick a paper ticket in a timeclock that stamps the time on it. At best, you carry an IsarCard in your pocket that nobody ever sees except you. If they WERE tracking entry/exit, they'd have a very distorted picture of how I used the system.

      Now, on a practical level, suppose you measure actual riders and see that a large percentage of them ride line 1 from A to D through B and C, then change to line 2 for D to Z via C, B, etc. It's shorter to go AB-Z but they're going ABCDCB-Z. Why? Poor signage? Bad maps? Does the change at B require a long walk or is the escalator always broken? Or does the station at D have the only KFC on the route and people are stopping there?

      If you look only at entry/exit, you will gather none of that data and not know that you need to study a potential problem at B and/or C.

      Consumers on average aren't stupid enough to want to spend any more time in the tube than absolutely necessary.

      "On average" is not "peak demand".

      Whether someone has fallen asleep on the line, or is going around in a circle really shouldn't matter for any of their scenarios.

      Of course he does, because he is a physical object consuming a seat on a limited resource. WHY he's consuming it isn't measured, only that he IS, and unless they can identify that HE is the same person they won' t know there might be a problem that can be fixed and reduce delays for others.

      Another Munich example. Visitors fly into an airport well outside and then take the S1 or S8 in. If they monitor entry/exit they'll get a count of how many people enter and leave where they do. If they monitor the entire trip, they may find that a lot of people take the S1 all the way to HBf (Hauptbanhof) and then they take the U2 to Hasenbergl. But they could have changed at Feldmoching and had a much shorter trip. Why didn't they? Were the on-train announcements not clear enough, should everyone who buys an IsarCard at the airport be given an MVV map, or what?

      And suppose that visitor leaves by taking the S1 out to Freising, says WTF?, gets back on the S1 to Neufarhn, and then goes to the airport. If they count entry/exit, it is just another rider from city to airport. If they track the path, however, they find out that maybe the information that the S1 splits at Neufarhn and the front half goes to Freising instead of the airport isn't being presented well enough.

      You can get a lot of data from monitoring entry/exit, yes. You get a lot more by tracking individual pax, even if you don't know their name.