Slashdot Mirror


Google's Sidewalk Labs Plans To Sell Location Data On Millions of Cellphones (theintercept.com)

An anonymous reader quotes a report from The Intercept: Most of the data collected by urban planners is messy, complex, and difficult to represent. It looks nothing like the smooth graphs and clean charts of city life in urban simulator games like "SimCity." A new initiative from Sidewalk Labs, the city-building subsidiary of Google's parent company Alphabet, has set out to change that. The program, known as Replica, offers planning agencies the ability to model an entire city's patterns of movement. Like "SimCity," Replica's "user-friendly" tool deploys statistical simulations to give a comprehensive view of how, when, and where people travel in urban areas. It's an appealing prospect for planners making critical decisions about transportation and land use. In recent months, transportation authorities in Kansas City, Portland, and the Chicago area have signed up to glean its insights. The only catch: They're not completely sure where the data is coming from.

Typical urban planners rely on processes like surveys and trip counters that are often time-consuming, labor-intensive, and outdated. Replica, instead, uses real-time mobile location data. As Nick Bowden of Sidewalk Labs has explained, "Replica provides a full set of baseline travel measures that are very difficult to gather and maintain today, including the total number of people on a highway or local street network, what mode they're using (car, transit, bike, or foot), and their trip purpose (commuting to work, going shopping, heading to school)." To make these measurements, the program gathers and de-identifies the location of cellphone users, which it obtains from unspecified third-party vendors. It then models this anonymized data in simulations -- creating a synthetic population that faithfully replicates a city's real-world patterns but that "obscures the real-world travel habits of individual people," as Bowden told The Intercept. The program comes at a time of growing unease with how tech companies use and share our personal data -- and raises new questions about Google's encroachment on the physical world.

3 of 100 comments (clear)

  1. Re:Anonymized by b0s0z0ku · · Score: 4, Informative

    Is it anonymized to the point where they can't see who's parking in which driveway or walking into which home? It may be technically "anonymous", but if locations are sufficiently accurate, any POS with a mind to it can "deanonymize" it relatively quickly.

  2. Re:Anonymized by evendiagram · · Score: 4, Informative

    TFA: "Any location data that Sidewalk Labs receives is already de-identified (using methods such as aggregation, differential privacy techniques, or outright removal of unique behaviors)"

    Differential privacy is a rigorous mathematical definition of privacy. In the simplest setting, consider an algorithm that analyzes a dataset and computes statistics about it (such as the data's mean, variance, median, mode, etc.). Such an algorithm is said to be differentially private if by looking at the output, one cannot tell whether any individual's data was included in the original dataset or not. In other words, the guarantee of a differentially private algorithm is that its behavior hardly changes when a single individual joins or leaves the dataset -- anything the algorithm might output on a database containing some individual's information is almost as likely to have come from a database without that individual's information. Most notably, this guarantee holds for any individual and any dataset. Therefore, regardless of how eccentric any single individual's details are, and regardless of the details of anyone else in the database, the guarantee of differential privacy still holds. This gives a formal guarantee that individual-level information about participants in the database is not leaked. https://privacytools.seas.harv...

  3. If (false) { . (true but not relevant) by raymorris · · Score: 3, Informative

    What you said is true, but not relevant.

    Google is distributing statistics about large populations, not tokenized data about individuals.

    Tokenized data (raw data with names replaced by numbers) can sometimes be de-anonymized. That's not what Google is doing.