Improperly Anonymized Logs Reveal Details of NYC Cab Trips
mpicpp (3454017) writes with news that a dump of fare logs from NYC cabs resulted in trip details being leaked thanks to using an MD5 hash on input data with a very small key space and regular format. From the article:
City officials released the data in response to a public records request and specifically obscured the drivers' hack license numbers and medallion numbers. ... Presumably, officials used the hashes to preserve the privacy of individual drivers since the records provide a detailed view of their locations and work performance over an extended period of time.
It turns out there's a significant flaw in the approach. Because both the medallion and hack numbers are structured in predictable patterns, it was trivial to run all possible iterations through the same MD5 algorithm and then compare the output to the data contained in the 20GB file. Software developer Vijay Pandurangan did just that, and in less than two hours he had completely de-anonymized all 173 million entries.
It turns out there's a significant flaw in the approach. Because both the medallion and hack numbers are structured in predictable patterns, it was trivial to run all possible iterations through the same MD5 algorithm and then compare the output to the data contained in the 20GB file. Software developer Vijay Pandurangan did just that, and in less than two hours he had completely de-anonymized all 173 million entries.
Too many governments and corporations continue to fail to understand that it requires having experts who actually know what they are doing be in charge of data security.
This doesn't mean you contract it out to the lowest bidder or hire the cheapest CS degree you can find.
It means you hire knowledge and experience, you hire expert skills, and those cost money.
Large organizations will consistently fail to hire/staff competent people for data security related issues, and will push back on fines or punitive findings by criminalizing publicizing their incompetence.
Thus sending all such talent straight to criminals who'll be happy to reward them with hard cash.
It's like these guys _want_ a dystopian future.
Make sure everyone's vote counts: Verified Voting
Software developer Vijay Pandurangan did just that, and in less than two hours he had completely de-anonymized all 173 million entries.
Having thereby run afoul of the circumvention of copyright protection mechanisms clause of the Digital Millenium Copyright Act, he was then subjected to the NYPD's controversial new program, and subsequently incarcerated.
Always assumed anywhere term "anonymized data" is used it is more likely than not to be companies and governments paying lip service to its customers... where data could easily be reversed into an identifiable way by either taking advantage of insufficient entropy or cross referencing datasets.
There is after all no cost for violating privacy or unnecessary risk exposure associated with disclosure.
One of my favorite examples of dangers of insufficient entropy stem from a PCI DSS requirement written by "experts" who should know better.
3.4 Render PAN unreadable anywhere it is stored (including on portable digital media, backup media, and in logs) by using any of the following approaches:
One-way hashes based on strong cryptography, (hash must be of the entire PAN) ...
Search space of typical 16-digit card numbers is no match for a modern CPU once you have taken check digit, card type, issuer and issuer specific numbering into account... "strong cryptography" can't fix stupid.
You are naive. The problem starts to crop up when you start correlating things. Then you can find all sorts of things, like patterns of visiting a mistress, people meeting in secret (which is perfectly legal, but the government fears it), etc.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
True that.
I am in the fortunate situation of having near unlimited funds. I was joking that I need a rubber stamp labeled "for security reasons", because whenever I want something, these three magic words will brush aside nearly all objections (ok, within reason, but anything 5 digits or less is nearly certainly mine if I "rubber stamp" it that way).
The most recent draft of the security procedures I did I peppered liberally with "insanity" as I call it. It's a political thing. You demand stuff that you don't really want but is so terribly obstructive to everyone else that they'll agree with what you actually want just to get the insane levels of "security" (read: obstruction and red tape) out of the way. To my unending horror (and slight amusement) they signed it off without changing a comma. Now find out how to argue why you want your own requirements out of the crap...
The reason isn't that our board suddenly found out how much they love security or how important the confidentiality of the (considerably sensitive, I should add) private data we hold here is. What changed is simply that our government upped the fines and punishment for data breeches considerably, up to and including jail time for board members if negligence can somehow be tacked to them. In a nutshell, unless you can show that you tried to stay on top of security when holding highly sensitive data, you should prepare to take a longer vacation, all expenses paid, in a holiday resort of your government's choice.
I guess when your ass is on the line, you get very willing to spend money.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Actually the movement of a cab is a wealth of information. Not by itself, but it's very good at connecting dots. If you want to follow someone around, these things tend to be invaluable. You can, essentially, follow someone around without following them around, even retroactively. People rarely go from place to place randomly. They have destinations. If someone takes a cab from the airport and doesn't live in the area where he landed, it is likely that his destination is the place that he will stay in. After a flight, especially a long one, people want to get rid of their heavy baggage, take a shower, put on new clothing. So you can easily find out where someone stayed. Which becomes twice as interesting if the destination is not a hotel, because now you got another person to screen.
This information by itself is not much. But as part of a bigger network it is something we'd have killed for back when I was still doing profiling.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
In other news, the credentials for their plug-n-play coffee machine are 'admin' 'admin', and their gym locker combo is 1234. Someone made a half-assed attempt to obfuscate some data that nobody cares about (unless your husband's a cheating cabbie, I guess) and someone cracked it. News?
Security through obscurity, using a custom algorithm, is the only way.
Not necessarily. I imagine the reason the hashed field was included in the published logs was to provide a key to group results by driver. Even if that driver was to remain anonymous. So all the city would have had to do is issue a system generated UID for each medallion/license number combination and populate the published data with that.
Nobody knows who driver 1, 2, 3, .., 736903, ... etc. are. But one can still analyze per-driver data.
Have gnu, will travel.
Very insightful Opportunist . :)
With more nations trying to count passports in and out a wealth of information about each person entering some countries is now been stored.
From face recognition, gait analysis, 'free' wifi, a new/old phone been set up for cheaper local use, the random risk of a laptop been examined and cloned on entry and exit.
If you want to rent a car you face a complex 'chat down' by the friendly on site rental staff.
So you take the next random taxi.
In the past along a long airport road the interaction of a few tailing vehicles might be detected given the number of turns into a city.
Destinations can be looked at over time, in near real time and as a history.
That first trip can open up a world of new digital 'hops' - old friends, college buddy, lover, extended family, until now unknown associate to having their lives been examined too.
If you go to a hotel you face another 'chat down' attempt by the friendly staff over a long complex CC or cash transaction.
No follow car pool or beacons needed anymore just go big, local and federally with “collect-it-all”
Domestic spying is now "Benign Information Gathering"
Taking MD5, it's published, and tweaking a few points (though who ever did this needs to be very competent) would have been sufficient.
No, that would have been stupid. It's unlikely someone would have reverse engineered your hacked md5 algorithm, but it's also possible you could screw it up.
The solution is VERY simple. Generate a random 256 bit string. Hash random-string+data, and use the output as the identifier. Throw away the random 256 bit string.
Some manager probably said any work for addition security wasn't worth the cost. Ooops!
No, some developer didn't know what the hell they were doing. You'd be surprised (but shouldn't be) how little most developers know about security, especially encryption.
AccountKiller
Has Joe Sixpack been seen near any anti war protests? Written to the press at a city, star or federal level? Given charitable contributions to a faith based group now under investigation? Have a security clearance? Have a family member with a new or old security clearance? Does Joe Sixpack travel outside the USA a lot?
Its not just about been "much easier" its about getting it all, having domestic staff feel ok about storing and sorting domestic details per person, been able to legally collect more domestically without needed per person court work.
Domestic spying is now "Benign Information Gathering"
The point is that you can't follow every Joe Random around all the time. But occasionally some Joe Random becomes a Joe Someone and you just wish you had the information that you could have if you just followed him.
Scenario.
You find out that there is someone you deem a nuisance to the powers that are. You finally caught him. But he doesn't talk. Imagine you're an entity that has access to a lot of information, either directly (because you have it) or indirectly (because you can request it). Using the CC information of your subject you find out that he recently spent time in another city (because you get the flight information). Since there is no other reason (like, say, business reasons), and since his travel visa says "vacation", you deem it likely that he met a contact or even an accomplice. You have no hotel bills on CC, so either he paid in cash or, and this is what you hope for, he stayed with his contact.
You know when his plane landed and you can even determine to some degree of certainty when he left the airport (you may even have access to the CCTV to pinpoint the moment). Of course more than one taxi leaves around that time, but most of them go to hotels (that you can then check out for reservations by the name of the person you're looking for). What you're really hoping for is a private address. And unless your subject was very careful, he might even have given the cab driver the real address, which now offers you another address and another contact to use.
Next thing you want to do is find out all cab movements to and from this address. It may be some kind of "hub" for people of that particular kind of nuisance, you may actually find some kind of structure. You can at least find out whether your subject also took cabs to other destinations and when, how often and where he went.
Or how about a more general approach? You could use the information to find out whether some private address gets visited by people from outside of town suspiciously often. What do they do there? Why do they go there? Do they stay there? If not, what could they be doing there?
Cabs offer a wealth of information. Again, by itself that information is fairly useless, but it is great for "connecting dots", because that's what cabs do: They move from point A to point B with their passenger.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Fines in a corporate world are a matter of risk management: How likely is it that it happens, what's the fine if it happens and how much do we save by not giving a damn? If this unholy trinity comes up with the "don't give a damn" on top, you don't give a damn and the fine becomes part of the operation cost. The more I get to play with C-Levels, the more I get the nagging feeling that I'm the only one weighed down by a consciousness.
Actually, I think it's more insidious. It's a blame shifting game where everyone can claim he's doing it for the "greater good", because "being bad" is actually "being good". Take the scenario where some people have to be laid off. The floor manager knows them personally. He knows every single one of them, he knows their personal life, their family situation and it really breaks his heart to let one of them go, but he knows he has to. Either he fires one of them or he might have to fire them all because they won't be profitable anymore with the new requirements, and that could lead to the shutdown of the entire branch. His superior may not know the people anymore, but he has to do it because he himself doesn't make that decision, that's been decided further up. He can't simply ignore an order from C-Level. The C's don't need to be psychopaths (though it sure helps, it seems...), they can even be compassionate, but they know that the investors will only keep their money in the company if they perform well and if the cash flow is to their liking. He can easily brush any troubles with his consciousness aside when he fires a few people now, since if he didn't their quarter figures won't look nice, stock would plummet and investors will jump ship, and then he'd have to lay off even more people. But you can't even blame the investment bankers. Because they have to pick the best performing stocks, it's not their money, it's money from investors, money they put aside for their retirement, the investors have a responsibility towards the people that entrust them with their money (ok, recent history shows that most don't give a shit, but let's assume we find an investment banker with a consciousness... it's just a thought experiment, remember). The people investing money don't even know WHAT they invest in, they just toss money onto their investor with the order to "make more of it". And they're not "evil" either, they just want to prepare for their retirement. That people could well be the same that get fired now for the sake of more profit. Essentially, they're firing themselves without knowing it.
But I ramble.
What this is supposed to show is that in the corporate world it's easy to play the blame shifting game and use the "but I have to!" excuse. It's sad but it seems the only escape from that game is to actually grab them at the nuts and tell them that they won't be shifting the blame anywhere. And behold, it works.
Of course that also means that I have to watch my back or it's going to be my ass that's going to jail. But fortunately all I have to do is heed the laws. And that's easy enough, surprisingly.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
The government has the info already, they handed it out!
After he discombobulated Agent Smith from the inside, Neo changed his name to incorporate all 3 identities.
Neo Smith Anderson.
> Target's breach cost them 50% of their revenue for a year.
No it did not. Not even close. At worst their profits for the subsequent quarter were down 50% or in terms of revenue, that's less than a 6% drop compared to a year ago.
A new car built by my company leaves somewhere traveling at 60 mph. The rear differential locks up. The car crashes and burns with everyone trapped inside. Now, should we initiate a recall? Take the number of vehicles in the field, A, multiply by the probable rate of failure, B, multiply by the average out-of-court settlement, C. A times B times C equals X. If X is less than the cost of a recall, we don't do one.
The United States dollar is the currency preferred by drug dealers, whose trade is in fact made more profitable by the failed "War on Drugs".
Write failed: Broken pipe
The War on Drugs is a massively successful enterprise if your definition of success is the ability to extract billions of USD worth of funding from taxpayers, with a disproportionate amount of said funding going to the overt militarization of police forces in the USA at the expense of civil liberties and human rights. However, if your indicators of success are tied to social, medical, or economic improvement for the citizens of the United States of America, the entire affair is indeed a massive failure.
For reference, this is coming from someone who consumes nothing more than nicotine (vaping these days, gave up cigarettes after 20 years) and whiskey, and once wore an actual military uniform for a living.
Write failed: Broken pipe
nope, it has to do with the key. given a tag # and license # you can dictionary attack the hash. especially since the the source data is known, easy to break.
If they'd used a keyed hash of tag # and license #, it wouldn't have been breakable. Even HMAC-MD5 would have been fine, given sufficient entropy in the key, though I'd have used HMAC-SHA256 just as a matter of good crypto hygiene.
And a custom algorithm is wrong, wrong, wrong. That's just begging for weakness in the solution. Use the proper standard algorithm for the job.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
This is not a new phenomenon. And not an easy one to solve. From The Grapes of Wrath by John Steinbeck: