Why is this Company Tracking Where You Are on Thanksgiving? (theoutline.com)
Earlier this week, several publications published a holiday-themed data study about how families that voted for opposite parties spent less time together on Thanksgiving, especially in areas that saw heavy political advertising. The data came from a company called SafeGraph that supplied publications with 17 trillion location markets for 10 million smartphones. A report looks at the bigger picture: The data wasn't just staggering in sheer quantity. It also appears to be extremely granular. Researchers "used this data to identify individuals' home locations, which they defined as the places people were most often located between the hours of 1 and 4 a.m.," wrote The Washington Post. The researchers also looked at where people were between 1 p.m. and 5 p.m. on Thanksgiving Day in order to see if they spent that time at home or traveled, presumably to be with friends or family. "Even better, the cellphone data shows you exactly when those travelers arrived at a Thanksgiving location and when they left," the Post story says. To be clear: This means SafeGraph is looking at an individual device and tracking where its owner is going throughout their day. A common defense from companies that creepily collect massive amounts of data is that the data is only analyzed in aggregate; for example, Google's database BigQuery, which allows organizations to upload big data sets and then query them quickly, promises that all its public data sets are "fully anonymized" and "contain no personally-identifying information." In multiple press releases from SafeGraph's partners, the company's location data is referred to as "anonymized," but in this case they seem to be interpreting the concept of anonymity quite liberally given the specificity of the data.
to safegraph. If the data is say held by Google and they allow only certain aggregate queries to be done but never give you anything but the aggregate answer then Safegraph won't know what happened in individual houses. This gets very tricky though. You have to have some thresholds about how small an area you can give a report on.
For example - The Canadian credit bureaus will sell reports based on postal code (a postal code is a side of a street, between intersections), that give the high, low and median score. Now if there were under a certain number of people in that postal code we didn't give the information (This was a decision made by the programmers, legally the company could) but what about the case where the high and the low score were almost the same? In such a case, revealing the high and the low essentially revealed everyone's score.