Improperly Anonymized Logs Reveal Details of NYC Cab Trips
mpicpp (3454017) writes with news that a dump of fare logs from NYC cabs resulted in trip details being leaked thanks to using an MD5 hash on input data with a very small key space and regular format. From the article:
City officials released the data in response to a public records request and specifically obscured the drivers' hack license numbers and medallion numbers. ... Presumably, officials used the hashes to preserve the privacy of individual drivers since the records provide a detailed view of their locations and work performance over an extended period of time.
It turns out there's a significant flaw in the approach. Because both the medallion and hack numbers are structured in predictable patterns, it was trivial to run all possible iterations through the same MD5 algorithm and then compare the output to the data contained in the 20GB file. Software developer Vijay Pandurangan did just that, and in less than two hours he had completely de-anonymized all 173 million entries.
It turns out there's a significant flaw in the approach. Because both the medallion and hack numbers are structured in predictable patterns, it was trivial to run all possible iterations through the same MD5 algorithm and then compare the output to the data contained in the 20GB file. Software developer Vijay Pandurangan did just that, and in less than two hours he had completely de-anonymized all 173 million entries.
Too many governments and corporations continue to fail to understand that it requires having experts who actually know what they are doing be in charge of data security.
This doesn't mean you contract it out to the lowest bidder or hire the cheapest CS degree you can find.
It means you hire knowledge and experience, you hire expert skills, and those cost money.
This is why we can't have nice things.....
Large organizations will consistently fail to hire/staff competent people for data security related issues, and will push back on fines or punitive findings by criminalizing publicizing their incompetence.
Thus sending all such talent straight to criminals who'll be happy to reward them with hard cash.
It's like these guys _want_ a dystopian future.
Make sure everyone's vote counts: Verified Voting
Always assumed anywhere term "anonymized data" is used it is more likely than not to be companies and governments paying lip service to its customers... where data could easily be reversed into an identifiable way by either taking advantage of insufficient entropy or cross referencing datasets.
There is after all no cost for violating privacy or unnecessary risk exposure associated with disclosure.
One of my favorite examples of dangers of insufficient entropy stem from a PCI DSS requirement written by "experts" who should know better.
3.4 Render PAN unreadable anywhere it is stored (including on portable digital media, backup media, and in logs) by using any of the following approaches:
One-way hashes based on strong cryptography, (hash must be of the entire PAN) ...
Search space of typical 16-digit card numbers is no match for a modern CPU once you have taken check digit, card type, issuer and issuer specific numbering into account... "strong cryptography" can't fix stupid.
You are naive. The problem starts to crop up when you start correlating things. Then you can find all sorts of things, like patterns of visiting a mistress, people meeting in secret (which is perfectly legal, but the government fears it), etc.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Actually the movement of a cab is a wealth of information. Not by itself, but it's very good at connecting dots. If you want to follow someone around, these things tend to be invaluable. You can, essentially, follow someone around without following them around, even retroactively. People rarely go from place to place randomly. They have destinations. If someone takes a cab from the airport and doesn't live in the area where he landed, it is likely that his destination is the place that he will stay in. After a flight, especially a long one, people want to get rid of their heavy baggage, take a shower, put on new clothing. So you can easily find out where someone stayed. Which becomes twice as interesting if the destination is not a hotel, because now you got another person to screen.
This information by itself is not much. But as part of a bigger network it is something we'd have killed for back when I was still doing profiling.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
In other news, the credentials for their plug-n-play coffee machine are 'admin' 'admin', and their gym locker combo is 1234. Someone made a half-assed attempt to obfuscate some data that nobody cares about (unless your husband's a cheating cabbie, I guess) and someone cracked it. News?
The government has the info already, they handed it out!
The United States dollar is the currency preferred by drug dealers, whose trade is in fact made more profitable by the failed "War on Drugs".
Write failed: Broken pipe
The War on Drugs is a massively successful enterprise if your definition of success is the ability to extract billions of USD worth of funding from taxpayers, with a disproportionate amount of said funding going to the overt militarization of police forces in the USA at the expense of civil liberties and human rights. However, if your indicators of success are tied to social, medical, or economic improvement for the citizens of the United States of America, the entire affair is indeed a massive failure.
For reference, this is coming from someone who consumes nothing more than nicotine (vaping these days, gave up cigarettes after 20 years) and whiskey, and once wore an actual military uniform for a living.
Write failed: Broken pipe