Improperly Anonymized Logs Reveal Details of NYC Cab Trips
mpicpp (3454017) writes with news that a dump of fare logs from NYC cabs resulted in trip details being leaked thanks to using an MD5 hash on input data with a very small key space and regular format. From the article:
City officials released the data in response to a public records request and specifically obscured the drivers' hack license numbers and medallion numbers. ... Presumably, officials used the hashes to preserve the privacy of individual drivers since the records provide a detailed view of their locations and work performance over an extended period of time.
It turns out there's a significant flaw in the approach. Because both the medallion and hack numbers are structured in predictable patterns, it was trivial to run all possible iterations through the same MD5 algorithm and then compare the output to the data contained in the 20GB file. Software developer Vijay Pandurangan did just that, and in less than two hours he had completely de-anonymized all 173 million entries.
It turns out there's a significant flaw in the approach. Because both the medallion and hack numbers are structured in predictable patterns, it was trivial to run all possible iterations through the same MD5 algorithm and then compare the output to the data contained in the 20GB file. Software developer Vijay Pandurangan did just that, and in less than two hours he had completely de-anonymized all 173 million entries.
"Oops"
-New York
I know someone who keeps logs of all phone calls, all e-mails, all movement of everybody.
Too many governments and corporations continue to fail to understand that it requires having experts who actually know what they are doing be in charge of data security.
This doesn't mean you contract it out to the lowest bidder or hire the cheapest CS degree you can find.
It means you hire knowledge and experience, you hire expert skills, and those cost money.
This is why we can't have nice things.....
Cue a CFAA trial and a long stay in a cozy federal PMITA penitentiary.
Ezekiel 23:20
Large organizations will consistently fail to hire/staff competent people for data security related issues, and will push back on fines or punitive findings by criminalizing publicizing their incompetence.
Thus sending all such talent straight to criminals who'll be happy to reward them with hard cash.
It's like these guys _want_ a dystopian future.
Make sure everyone's vote counts: Verified Voting
People will know driver XYZ drove from 122 Main St to 123 Second St?
It's not like they have the info on where the person was actually going when they got out of the cab.
This isn't even an issue. *yawn*
Now you must go to jail Sorry :-(
Software developer Vijay Pandurangan did just that, and in less than two hours he had completely de-anonymized all 173 million entries.
Having thereby run afoul of the circumvention of copyright protection mechanisms clause of the Digital Millenium Copyright Act, he was then subjected to the NYPD's controversial new program, and subsequently incarcerated.
Wait. It's NY city. We can't do that.
Oops.
Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
Target's breach cost them 50% of their revenue for a year.
That nearly put them out of business.
The meeting between them and the card carriers went something like
AMEX_Discover_MasterCard_VISA: You are paying to replace cards, paying for the fraud from the compromise (10-15 billion\year), and you are paying enhanced fee's for several years to us until you proove you are again trustworthy of the normal rates we give your competitors.
Target: And if we refuse?
AMEX_Discover_MasterCard_VISA: We will choose not to do business with you. Your customers will have to buy in cash.
Target: Oh...well then. Where do I sign?
As systems become more integrated, Data Security is going to become less about keeping egg of of your face and more about corporate and personal survival. Those old movies from the 80's with the hacker causing the elevator to drop 100 stories, or the cellphone battery to explode, or the factory to go out of business for months on end, and so on?
The industry is already moving towards true security.
When data starts being used against people personally, they will begin asking questions, and then it will become very important.
Until then, if you're a hacker and know your shit, enjoy being God.
Always assumed anywhere term "anonymized data" is used it is more likely than not to be companies and governments paying lip service to its customers... where data could easily be reversed into an identifiable way by either taking advantage of insufficient entropy or cross referencing datasets.
There is after all no cost for violating privacy or unnecessary risk exposure associated with disclosure.
One of my favorite examples of dangers of insufficient entropy stem from a PCI DSS requirement written by "experts" who should know better.
3.4 Render PAN unreadable anywhere it is stored (including on portable digital media, backup media, and in logs) by using any of the following approaches:
One-way hashes based on strong cryptography, (hash must be of the entire PAN) ...
Search space of typical 16-digit card numbers is no match for a modern CPU once you have taken check digit, card type, issuer and issuer specific numbering into account... "strong cryptography" can't fix stupid.
why did NYC attempt to hide the data in the first place?
Surely Vijay Pandurangan will not be arrested for hacking?
For this application, MD5 did not make a difference. SHA512 would have been just as insecure. For some applications, MD5 is perfectly secure if used competently. This example is one and the original story doe snot claim any culpability on the part of MD5. As always, there is no substitute for knowing what you are doing.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
True that.
I am in the fortunate situation of having near unlimited funds. I was joking that I need a rubber stamp labeled "for security reasons", because whenever I want something, these three magic words will brush aside nearly all objections (ok, within reason, but anything 5 digits or less is nearly certainly mine if I "rubber stamp" it that way).
The most recent draft of the security procedures I did I peppered liberally with "insanity" as I call it. It's a political thing. You demand stuff that you don't really want but is so terribly obstructive to everyone else that they'll agree with what you actually want just to get the insane levels of "security" (read: obstruction and red tape) out of the way. To my unending horror (and slight amusement) they signed it off without changing a comma. Now find out how to argue why you want your own requirements out of the crap...
The reason isn't that our board suddenly found out how much they love security or how important the confidentiality of the (considerably sensitive, I should add) private data we hold here is. What changed is simply that our government upped the fines and punishment for data breeches considerably, up to and including jail time for board members if negligence can somehow be tacked to them. In a nutshell, unless you can show that you tried to stay on top of security when holding highly sensitive data, you should prepare to take a longer vacation, all expenses paid, in a holiday resort of your government's choice.
I guess when your ass is on the line, you get very willing to spend money.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
I de-anonymized this comment by signing in.
org.slashdot.post.SignatureNotFoundException: ewg
Using any public hash exposes you to dictionary attacks. Especially when you publish which one you've used.
The quality of the encryption is irrelevant.
Security through obscurity, using a custom algorithm, is the only way.
Taking MD5, it's published, and tweaking a few points (though who ever did this needs to be very competent) would have been sufficient.
Some manager probably said any work for addition security wasn't worth the cost. Ooops!
In other news, the credentials for their plug-n-play coffee machine are 'admin' 'admin', and their gym locker combo is 1234. Someone made a half-assed attempt to obfuscate some data that nobody cares about (unless your husband's a cheating cabbie, I guess) and someone cracked it. News?
The 'classic', or alpha version is, as the name implies, reserved for the alpha users. All the rest will have to stick with the beta version.
You've elegantly described why stiff federal penalties are needed.
Interesting that when a direct line to someone's pocketbook is defined everyone gets on board, but when it's just a chance someone's drinking water would be tainted with cancer causing chemicals most can't find the connection.
Corporate malfeasance comes in all forms.
Fines in a corporate world are a matter of risk management: How likely is it that it happens, what's the fine if it happens and how much do we save by not giving a damn? If this unholy trinity comes up with the "don't give a damn" on top, you don't give a damn and the fine becomes part of the operation cost. The more I get to play with C-Levels, the more I get the nagging feeling that I'm the only one weighed down by a consciousness.
Actually, I think it's more insidious. It's a blame shifting game where everyone can claim he's doing it for the "greater good", because "being bad" is actually "being good". Take the scenario where some people have to be laid off. The floor manager knows them personally. He knows every single one of them, he knows their personal life, their family situation and it really breaks his heart to let one of them go, but he knows he has to. Either he fires one of them or he might have to fire them all because they won't be profitable anymore with the new requirements, and that could lead to the shutdown of the entire branch. His superior may not know the people anymore, but he has to do it because he himself doesn't make that decision, that's been decided further up. He can't simply ignore an order from C-Level. The C's don't need to be psychopaths (though it sure helps, it seems...), they can even be compassionate, but they know that the investors will only keep their money in the company if they perform well and if the cash flow is to their liking. He can easily brush any troubles with his consciousness aside when he fires a few people now, since if he didn't their quarter figures won't look nice, stock would plummet and investors will jump ship, and then he'd have to lay off even more people. But you can't even blame the investment bankers. Because they have to pick the best performing stocks, it's not their money, it's money from investors, money they put aside for their retirement, the investors have a responsibility towards the people that entrust them with their money (ok, recent history shows that most don't give a shit, but let's assume we find an investment banker with a consciousness... it's just a thought experiment, remember). The people investing money don't even know WHAT they invest in, they just toss money onto their investor with the order to "make more of it". And they're not "evil" either, they just want to prepare for their retirement. That people could well be the same that get fired now for the sake of more profit. Essentially, they're firing themselves without knowing it.
But I ramble.
What this is supposed to show is that in the corporate world it's easy to play the blame shifting game and use the "but I have to!" excuse. It's sad but it seems the only escape from that game is to actually grab them at the nuts and tell them that they won't be shifting the blame anywhere. And behold, it works.
Of course that also means that I have to watch my back or it's going to be my ass that's going to jail. But fortunately all I have to do is heed the laws. And that's easy enough, surprisingly.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
data breeches
bring me my computing pants!
You are in rare form. Glad to be here for it.
> Target's breach cost them 50% of their revenue for a year.
No it did not. Not even close. At worst their profits for the subsequent quarter were down 50% or in terms of revenue, that's less than a 6% drop compared to a year ago.
It has never been about protecting you the customer with the CC, but to give bank & firm a protection against lawsuit or class action in case of massive breach , now they can simply say "hey we were respecting the PCI DSS standard" and be out of the heat. That's why there is no real security, or requirement to have something stronger like a salt hash.
Indeed--I suspect he's had a bit much to drink. But it's really quite fascinating...
It was a pleasant fantasy and you had to go and spoil it all.
A new car built by my company leaves somewhere traveling at 60 mph. The rear differential locks up. The car crashes and burns with everyone trapped inside. Now, should we initiate a recall? Take the number of vehicles in the field, A, multiply by the probable rate of failure, B, multiply by the average out-of-court settlement, C. A times B times C equals X. If X is less than the cost of a recall, we don't do one.
I they would have salted the hash, they may have gotten away with it.
A new car built by my company [...] car crashes and burns with everyone trapped inside. Now, should we initiate a recall?
No, you just need to stop making such shitty cars.
systemd is Roko's Basilisk.
The solution is VERY simple. Generate a random 256 bit string. Hash random-string+data, and use the output as the identifier. Throw away the random 256 bit string.
If you're going to throw away the salt, why not just assign a unique, shuffled identifier for each data string?
A hash collision could make it look like a single taxi driving in opposite directions simultaneously, or it could cause a pair of day-night shift taxis to appear to be a single taxi that's used 24/7. So if you want to avoid hash collisions, you at least have to verify that none of the values hashed to the same value, and the cost of doing that is roughly the same as the extra overhead of generating a shuffled identifier.
Looks like that reference went over your head a little...
Why? As long as people buy them, there is no pressing need provided that the profit outmatches the potential fines. That's corporate logic.
What? Oh, people die, yes. That's where the potential fines come into play.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
This is what scares licensed cabbies. Uber gives you a map of your journey. Licensed cabbies can drive you round in circles and you cannot prove it.
Now this data is open, you should:
map all the start and end locations
calculate the optimal route
identify the cabs and medallions that deviate most from the optimal
fine them
ban them.
If you want any form of quality control in any system, you must sample a portion of all work and verify it. Even with experienced and proven honest operators, you must still check 10% of their work. This isn't about trust. It's just best practice. Cabbies are finally going under the spotlight and they don't like it.
And in your scenario, all of the people making all of those decisions are in fact right. Compassion is a fine thing, but at the end of the day what benefits all of us is economic efficiency. It is hard on the people who are fired, and that's a good reason to give them generous severance packages because in the end that's unlikely to do significant damage to the bottom line and the goodwill generated has significant value, but keeping people on just to be nice is a bad idea.
You need garters for those breeches.
This is not a new phenomenon. And not an easy one to solve. From The Grapes of Wrath by John Steinbeck:
No, you just need to stop making such shitty cars.
Seems a lot of people got whooshed by the original post, so:
I have changed your automobile safety design. Pray I do not change it further -- T. Durden
https://app.box.com/WitthoftResume Code: https://github.com/cellocgw
Is that like your bus pants?
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.