Slashdot Mirror


Improperly Anonymized Logs Reveal Details of NYC Cab Trips

mpicpp (3454017) writes with news that a dump of fare logs from NYC cabs resulted in trip details being leaked thanks to using an MD5 hash on input data with a very small key space and regular format. From the article: City officials released the data in response to a public records request and specifically obscured the drivers' hack license numbers and medallion numbers. ... Presumably, officials used the hashes to preserve the privacy of individual drivers since the records provide a detailed view of their locations and work performance over an extended period of time.

It turns out there's a significant flaw in the approach. Because both the medallion and hack numbers are structured in predictable patterns, it was trivial to run all possible iterations through the same MD5 algorithm and then compare the output to the data contained in the 20GB file. Software developer Vijay Pandurangan did just that, and in less than two hours he had completely de-anonymized all 173 million entries.

192 comments

  1. Oops. by mythosaz · · Score: 1

    "Oops"

    -New York

    1. Re:Oops. by Anonymous Coward · · Score: 1

      You mean the hacks got hacked?

      This may sound like a funny incident, but it does point to the vulnerability I've always pointed out about Bitcoin: the block chain tells you who got what. Sure, the identities are hashed, but aggregate those hashes and compare them to other kinds of records and you can start drawing all kinds of interesting inferences.

    2. Re:Oops. by Anonymous Coward · · Score: 0

      but it does point to the vulnerability I've always pointed out about Bitcoin

      This is like pointing out the vulnerability in a screen whereby it lets air through.

      Bitcoin. Is. Not. Anonymous. Currency.

      Bitcoin. Is. Not. Meant. To. Be. Anonymous. Currency.

    3. Re:Oops. by Anonymous Coward · · Score: 0

      That's not a vulnerability. You're perpetuating the widespread misconception that anonymity was ever a Bitcoin design goal. It wasn't and it isn't. You've apparently spent a bunch of time stating a falsehood.

      (philip.paradis posting as AC because I don't log in on this machine)

    4. Re:Oops. by philip.paradis · · Score: 4, Insightful

      The United States dollar is the currency preferred by drug dealers, whose trade is in fact made more profitable by the failed "War on Drugs".

      --
      Write failed: Broken pipe
    5. Re:Oops. by Anonymous Coward · · Score: 0

      Are you sure it's failed?

    6. Re:Oops. by philip.paradis · · Score: 3, Insightful

      The War on Drugs is a massively successful enterprise if your definition of success is the ability to extract billions of USD worth of funding from taxpayers, with a disproportionate amount of said funding going to the overt militarization of police forces in the USA at the expense of civil liberties and human rights. However, if your indicators of success are tied to social, medical, or economic improvement for the citizens of the United States of America, the entire affair is indeed a massive failure.

      For reference, this is coming from someone who consumes nothing more than nicotine (vaping these days, gave up cigarettes after 20 years) and whiskey, and once wore an actual military uniform for a living.

      --
      Write failed: Broken pipe
    7. Re:Oops. by Anonymous Coward · · Score: 0

      *WHOOSH*

      I think your parent is pointing to the possibility that the lawmakers are in cahoots.

      And, from past experiences, I'm with him/her.

    8. Re:Oops. by Anonymous Coward · · Score: 0

      And of all things, Tide detergent....

      http://www.nytimes.com/roomfordebate/2013/01/14/why-would-drug-dealers-use-tide-as-a-currency

  2. That's nothing by Anonymous Coward · · Score: 1

    I know someone who keeps logs of all phone calls, all e-mails, all movement of everybody.

    1. Re:That's nothing by msauve · · Score: 1

      Are his initials "NSA?"

      --
      "National Security is the chief cause of national insecurity." - Celine's First Law
    2. Re:That's nothing by viperidaenz · · Score: 2

      After he discombobulated Agent Smith from the inside, Neo changed his name to incorporate all 3 identities.

      Neo Smith Anderson.

    3. Re:That's nothing by SpzToid · · Score: 1

      Brilliant. Of course. This just makes so much sense now.

      --
      You can't be ahead of the curve, if you're stuck in a loop.
  3. Data Security Officer by FlyHelicopters · · Score: 4, Insightful

    Too many governments and corporations continue to fail to understand that it requires having experts who actually know what they are doing be in charge of data security.

    This doesn't mean you contract it out to the lowest bidder or hire the cheapest CS degree you can find.

    It means you hire knowledge and experience, you hire expert skills, and those cost money.

    1. Re:Data Security Officer by fuzzyfuzzyfungus · · Score: 2, Insightful

      In this case, it sounds like whoever got handed the job just couldn't, didn't care to, or was overruled about, thinking like an attacker.

      There are probably subtler methods of de-anonymizing the data that would require nontrivial skill to think of and counter; but it's a bit surprising to see somebody who knows enough about manipulating data to pull 20GB of records and hash a single field in each one without hurting himself or munging the result; but doesn't think "Medallion numbers are written on cabs. Somebody could grab dozens of them while waiting by the curb at the airport and just MD5 them in milliseconds", much less "Medallion numbers are quite short, someone could traverse the whole damn keyspace in a few days at most".

      Either their person thinks that MD5 is magic, or his thought process marched in a nice straight line from request to solution, without ever thinking about attack: "We need all medallion numbers replaced with internally consistent but unrelated UIDs." "Umm, OK. Hey, a hash function is deterministic and non-reversible, it's perfect!"

    2. Re: Data Security Officer by MalleusEBHC · · Score: 1

      Adding a salt is a trivial way of fixing this.

    3. Re: Data Security Officer by WaffleMonster · · Score: 2

      Adding a salt is a trivial way of fixing this.

      No it aint.

    4. Re: Data Security Officer by m.dillon · · Score: 2

      Except you can decode the salt trivially if you took a cab ride that happens to be in the data set and you recorded the license and medallion number. At which point the salt is useless.

      -Matt

    5. Re:Data Security Officer by Anonymous Coward · · Score: 0

      Maybe we would rather have the data than the governent clamping up and releasing nothing (just in case) - and changing the laws because of the cost burden.

    6. Re:Data Security Officer by Opportunist · · Score: 4, Interesting

      You can contract it out to the lowest bidder without a problem. There only have to be 2 clauses in the contract:

      1) You have a GOOD ITSEC company audit the shit out of it before it goes live.
      2) If the audit reveals that the company taking the contract don't know jack about security, THEY will pay for the audit and THEY will improve the software until they think it's finally good enough.

      1 and 2 are repeated until 1 turns out good.

      I worked for a very long time in government. And I learned one thing: You are not supposed to know shit. You are supposed to buy knowledge.

      --
      We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
    7. Re:Data Security Officer by gweihir · · Score: 1

      It is not a surprise when you consider where else they mess up spectacularly. It is like there is no active intelligence to be had in these organizations.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    8. Re: Data Security Officer by msauve · · Score: 1

      Using a one time pad is even easier.

      --
      "National Security is the chief cause of national insecurity." - Celine's First Law
    9. Re: Data Security Officer by fuzzyfuzzyfungus · · Score: 1

      It does make your table 'o handy precomputed hashes unhelpful; but on such a computationally trivial keyspace that barely matters.

      I wonder if the choice of hashing, rather than substituting a UUID, was based on not thinking through the weakness of a hash under the circumstances, or based on the extra difficulty of making sure that the same UUID is substituted for the same hack and medallion number in all instances? It's not a whole lot of additional difficulty; but the tipping point has to live somewhere...

    10. Re: Data Security Officer by ColdWetDog · · Score: 1

      For taxi cabs?

      --
      Faster! Faster! Faster would be better!
    11. Re: Data Security Officer by cheater512 · · Score: 1

      What part of the story used ANY precomputed rainbow tables? None.

      salt + "1234", if you know the "1234" then its a tiny brute force to get the salt.

    12. Re: Data Security Officer by Anonymous Coward · · Score: 0

      I think he meant using a secret and sufficiently long salt. At this point, this pretty much becomes a HMAC. But anyway, just encrypting the license might be easier and more straightforward.

    13. Re:Data Security Officer by Anonymous Coward · · Score: 0

      Too many governments and corporations continue to fail to understand that it requires having experts who actually know what they are doing be in charge of data security.

      This doesn't mean you contract it out to the lowest bidder or hire the cheapest CS degree you can find.

      It means you hire knowledge and experience, you hire expert skills, and those cost money.

      And always consult the Slashdot crowd first . . .

    14. Re:Data Security Officer by penix1 · · Score: 4, Interesting

      From TFS...

      City officials released the data in response to a public records request and specifically obscured the drivers' hack license numbers and medallion numbers...

      How many of you here have had to deal with a Freedom Of Information Act (FOIA) request which is what a "public records request" is? I have had the pleasure over a dozen times. You have 10 days to respond to that request in my state. Some states it is even less. Failure to do so can result in stiff penalties. 10 days is hardly enough time to contract out to someone and have the job "done right".

      It means you hire knowledge and experience, you hire expert skills, and those cost money.

      And you are happy to have your taxes raised to pay those fees? Riiiight!

      --
      This is a sig. This is only a sig. Had this been an actual sig you would have been informed where to tune for more sigs.
    15. Re: Data Security Officer by msauve · · Score: 3, Informative

      Sure. I'm assuming there's a requirement to have a unique transformation of medallion numbers (otherwise, you wouldn't have to include even a hashed version)...

      Instead of applying some hash to the medallion number, just do something like:
      Change all appearances of the first number in the list to "1". Change all appearances of the next unique medallion number in the list to "2." Etc.

      The result is in essence a OTP. Unless records of the process are kept, it's irreversible (lacking external info, such as medallion number x picked up a fare at location y at time z and correlated info is in the info provided)..

      --
      "National Security is the chief cause of national insecurity." - Celine's First Law
    16. Re:Data Security Officer by chromaexcursion · · Score: 1

      Small problem.
      Taxi Hack numbers are available in a publicly accessible data base.
      A determined individual probably could find license numbers, they may be publicly accessible.
      Failure to understand the vulnerability is the design failure.
      A simple solution would have been to order the hashes numerically and re-number them cardinally. ie. 1,2,3 ...
      Would take less than a minute, for someone than knew how.
      Perhaps a few hours if the right person had to be tracked down.
      Never release source data.

    17. Re:Data Security Officer by sexybomber · · Score: 5, Informative

      Your State may be different, but New York's Freedom of Information Law (or FOIL, we like to be different) works like this:

      The agency has to respond within five business days, but that response can read something like:

      Dear Sexybomber:

      We have received your request for public records pursuant to FOIL. Due to the complexity of the records you have requested, it may not be possible to produce them within the standard 20-day statutory period. We anticipate that we will be able to produce the records you have requested within 40 days. If you have questions or concerns, please direct them in writing to the address above.

      If they run into a snag, they have to inform you of this and produce the records within a "reasonable period".

      So it's not like NYC was under a five-day time crunch here. They could easily have responded and said it would take 40 or 60 days, being as there were several million records requested. That's definitely long enough to bring in a consultant (or even one of the more technically-literate staff members) to properly secure the data.

    18. Re: Data Security Officer by Anonymous Coward · · Score: 0, Insightful

      Change all appearances of the first number in the list to "1".

      You have described something most definitely NOT a one time pad. In an OTP scheme, every *instance* of any particular value maps with equal probability to every potential output value. What you described is a basic substitution cipher--trivial to crack by frequency analysis. Every input value has a definite output value to which it maps with 100% probability. Once you find the first correlation between input/output, you can replace all the others. Not so for an OTP. Frequency analysis won't do squat if your OTP was generated in a truly random fashion and applied correctly.

      And this, folks, is why you shouldn't trust advice from strangers about crypto or homebrew crypto schemes. Play with them, learn about the principles, but please, for the love of FSM, do not trust them.

    19. Re:Data Security Officer by chriscappuccio · · Score: 3, Insightful

      Sorry but unless you define "GOOD ITSEC company audit the shit out of it" in tangible terms that can actually hold someone liable for failure in a real way, this is just baloney. And if you define it with teeth, the price will increase. Basically, to define it properly, you'd be able to do it yourself. Oops.

    20. Re: Data Security Officer by Anonymous Coward · · Score: 0

      Especially true if the salt is static or easily predictable.

    21. Re:Data Security Officer by Anonymous Coward · · Score: 0

      I worked for a very long time in government. And I learned one thing: You are not supposed to know shit. You are supposed to buy knowledge.

      Bull spittle. How it's currently being done is not how it should be done, nor how it has been done in the past. How it's currently being done is broken and flawed.

    22. Re: Data Security Officer by Anonymous Coward · · Score: 0

      A response doesn't require you to provide all the data within 10 days though - as long as you *respond* to the requester, you can still tell them it will take x days to gather and process the info - as long as the communicated time line isn't unreasonable, it's still OK.

    23. Re:Data Security Officer by SeaFox · · Score: 1

      I worked for a very long time in government. And I learned one thing: You are not supposed to know shit. You are supposed to buy knowledge.

      Isn't that how the entire job market works? That's why we have the education loan bubble we have -- employers don't believe you know anything without a piece of paper showing you spent thousands of dollars to learn it.

    24. Re: Data Security Officer by Buzer · · Score: 1

      What? The only thing you would learn is that one license & medallion number (as in, you know which hash means that combination). You wouldn't know the actual salt (unless the hash algorithm was complete shit/your salt is too short for bruteforce).

    25. Re: Data Security Officer by Anonymous Coward · · Score: 0

      Adding a salt is a trivial way of fixing this.

      No it aint.

      Care to explain why? If each hash has a different individual salt (the only correct way to use salt) and you don't include the salt in the public data (and not use MD5) then there will be no shared pattern amongst all the hashes which makes them far more difficult to reverse when you have to brute the salt as well.

    26. Re: Data Security Officer by Anonymous Coward · · Score: 4, Informative

      A naive use of salt would mean that you might as well omit the data. The aim of including the values in hashed form is to be able to say: This is the same driver as this. So same numbers have to hash to same numbers, which means you can't hash individual lines with different salts or you lose that information. In order to keep that information, you have to hash same numbers with the same salt each time. That basically gives you a random number with which to replace each number. So that works, but it removes the reason for using a hash, which is to have a local operation which creates a global irreversible one-to-one mapping. If you have to create one salt per unique number, you might as well use the salt as irreversible identifier.

    27. Re: Data Security Officer by philip.paradis · · Score: 1, Insightful

      I'm appalled that your post has been modded "informative." Please do us all a favor and abstain from any future posts on cryptography. Instead, I recommend you spend your time with resources like Applied Cryptography. Seriously, please put down the shovel, and if you're doing anything involving crypto for a living, please do the world a favor and resign today.

      --
      Write failed: Broken pipe
    28. Re:Data Security Officer by Anonymous Coward · · Score: 0

      Too many people still think data can be secured. Fixed it for you.

    29. Re:Data Security Officer by philip.paradis · · Score: 0

      You should probably resign tomorrow on grounds of not understanding the legal statutes you're subject to. Perhaps you should have consulted an attorney. I'm sure the taxpayers supporting your salary would be dismayed to learn you don't understand your own job requirements.

      --
      Write failed: Broken pipe
    30. Re:Data Security Officer by philip.paradis · · Score: 1

      Hint: 30 seconds of my time leads me to believe this applies to you: Pennsylvania’s New Right to Know Law. If I'm in error on the state in question, please let me know, and I'll be more than glad to guide you to the appropriate legislation for your jurisdiction.

      --
      Write failed: Broken pipe
    31. Re: Data Security Officer by complete+loony · · Score: 2

      Anonymising the data just requires replacing each key with something unrecognisable. The GP's suggestion passes the smell test, though I would suggest randomising the list instead of assigning id values sequentially.

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
    32. Re: Data Security Officer by N1AK · · Score: 1

      It is informative. Unless you knew that a particular record in the dataset was for a specific medallion/plate combo then what he's suggesting is sufficient to obscure the driver. If you did know that then you couldn't obfuscate the data without making it impossible to tell which records relate to the same (known) vehicle. If you're happy to do that then you could just not include any reference to either medallion or plates in any format in the data.

      I'm not remotely surprised that someone on the internet can lambast someone else when they clearly haven't understood either the issue or their proposed solution.

    33. Re: Data Security Officer by ultranova · · Score: 1

      I'm appalled that your post has been modded "informative."

      I'm appalled that yours has been modded "Insightful" despite having no content beyond a verbose "you suck".

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

    34. Re: Data Security Officer by philip.paradis · · Score: 0

      "Business Analyst at Faccenda Group Ltd" ... it's almost worth reaching out to your employer to explain the more esoteric aspects of your ignorance here. Perhaps I'd spare a few clients the pain of professional malfeasance.

      --
      Write failed: Broken pipe
    35. Re: Data Security Officer by philip.paradis · · Score: 1

      You must have missed the motherfucking literary reference I linked. Read the fucking book (and hopefully a few more), you fucking retard.

      --
      Write failed: Broken pipe
    36. Re: Data Security Officer by philip.paradis · · Score: 1

      Look, seriously, provide an address and I'll ship you a fucking copy of the book. Your choice.

      --
      Write failed: Broken pipe
    37. Re: Data Security Officer by msauve · · Score: 4, Funny

      Do you always dig in so forcefully when you're demonstrably wrong?

      --
      "National Security is the chief cause of national insecurity." - Celine's First Law
    38. Re: Data Security Officer by philip.paradis · · Score: 0

      Please demonstrate how I'm wrong. You just became a personal project of mine.

      --
      Write failed: Broken pipe
    39. Re:Data Security Officer by AmiMoJo · · Score: 1

      It was probably just overconfidence. Someone googled the solution, thought it didn't look hard, and told their boss they could take care of it and save $$$ in the process.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    40. Re: Data Security Officer by philip.paradis · · Score: 1

      To be clear, advice similar to the sort you administered in the post I originally replied to is an apt explanation for why we have the number of massive failures in cryptographic functionality in software these days. You have absolutely no business even beginning to comment on this subject. May I please ship you a few hardcopy references?

      --
      Write failed: Broken pipe
    41. Re: Data Security Officer by msauve · · Score: 2

      philip.paradis is simply being a assholish troll.

      The original medallion and license(?) numbers need to be transformed into unique but consistent identifiers in the output, so one can still follow an individual cab/driver, but not be able to identify them in the real world.

      Assuming the dataset is ordered in some way (such as by date and time, which seems logical), even changing each cab/driver number to a unique, truly random number wouldn't be any more secure than the sequential assignment I gave as an example. Because, one could take the list generated that way, apply my example, and produced exactly the same list as if the sequential assignment were done in the first place. The only information the example I gave reveals is the order in which the numbers originally appeared. As long as you don't first sort the list by cab or driver number, you reveal nothing about the original numbers.

      philip.paradis can now break his troll brain figuring out how the original numbers can be discovered without having more external info to correlate with.

      --
      "National Security is the chief cause of national insecurity." - Celine's First Law
    42. Re: Data Security Officer by msauve · · Score: 1
      --
      "National Security is the chief cause of national insecurity." - Celine's First Law
    43. Re: Data Security Officer by Anonymous Coward · · Score: 0

      Maybe you could just give a summary of why he's wrong.
      A link to information theory is useless as well because it's way too broad to know which part makes him wrong.

    44. Re: Data Security Officer by Anonymous Coward · · Score: 0

      Mekelweg 4
      2628 CD Delft
      The Netherlands

      I await your book expectantly.

    45. Re: Data Security Officer by philip.paradis · · Score: 1

      You're still completely wrong. I'm willing to spend my own money to ship you hardcopy references that will help you better yourself, in the hope that you will stop dispensing the sort of horrid advice you're continuing to regurgitate here. Why aren't you willing to take me up on this offer? Are you unable to provide a shipping address of any sort?

      --
      Write failed: Broken pipe
    46. Re: Data Security Officer by philip.paradis · · Score: 0

      Somehow, I don't think you'll take me up on my offer. To do so would be tantamount of an admission of ignorance on your part. I really don't think you have the nuts for it. All the same, I replied to your linked post.

      --
      Write failed: Broken pipe
    47. Re: Data Security Officer by philip.paradis · · Score: 0

      Just start with entropy and work your way out from there.

      --
      Write failed: Broken pipe
    48. Re: Data Security Officer by Anonymous Coward · · Score: 0

      How do you do frequency analysis on a sequence of numbers?
      It works with text because some letters are more likely to appear than others.
      Each number appears exactly once per cab driver.

    49. Re: Data Security Officer by philip.paradis · · Score: 1

      Thank you. I'll dispatch the shipment in a few hours in the care of "ultranova", provided I get a response back under that user account indicating confirmation of the destination address. I'll provide a post tracking reference here once the shipment is confirmed to be in transit.

      --
      Write failed: Broken pipe
    50. Re: Data Security Officer by Anonymous Coward · · Score: 0

      I know. How is this asshole not modded troll?

    51. Re: Data Security Officer by philip.paradis · · Score: 0

      You must want relevant books shipped to you as well. Unfortunately, you're a pile of shit AC, so you won't get anything for your troubles.

      --
      Write failed: Broken pipe
    52. Re: Data Security Officer by Anonymous Coward · · Score: 0

      OK.
      I did, but I still don't see why he's wrong.

    53. Re: Data Security Officer by msauve · · Score: 1

      Repeating an incorrect statement doesn't make it correct. You're really not very good at trolling, or much of anything it seems.

      --
      "National Security is the chief cause of national insecurity." - Celine's First Law
    54. Re: Data Security Officer by philip.paradis · · Score: 1

      Are you being completely serious and saying that you don't recognize how the process described in the original post is anything but a one time pad?

      --
      Write failed: Broken pipe
    55. Re: Data Security Officer by philip.paradis · · Score: 1

      Why won't you accept a shipment of formal reference materials?

      --
      Write failed: Broken pipe
    56. Re:Data Security Officer by Anonymous Coward · · Score: 0

      And I learned one thing: You are not supposed to know shit. You are supposed to buy knowledge.

      This is true everywhere.

      I worked for a very long time in government.

      That's hard to believe when you write stuff like this:

      You can contract it out to the lowest bidder without a problem. There only have to be 2 clauses in the contract: 1) You have a GOOD ITSEC company audit the shit out of it before it goes live. 2) If the audit reveals that the company taking the contract don't know jack about security, THEY will pay for the audit and THEY will improve the software until they think it's finally good enough.

      Have you ever even seen a government contract or any contract that looked like that?

    57. Re: Data Security Officer by philip.paradis · · Score: 1
      --
      Write failed: Broken pipe
    58. Re: Data Security Officer by nabsltd · · Score: 1

      Please demonstrate how I'm wrong. You just became a personal project of mine.

      The problem was that the data released in the FOIA response had personally-identifiable information (the mediallion number) replaced with something that could be used to re-generate the PII without any information that isn't public.

      The GP's scheme was to replace the PII with a number that cannot be used to re-generate the PII with just the FOIA response. The PII could be re-generated if you had some kind of extra knowledge (e.g., the mapping used, or knowledge of when a particular cab was at a particular location), but this is still the best you can do with this sort of information release.

    59. Re: Data Security Officer by philip.paradis · · Score: 1
      --
      Write failed: Broken pipe
    60. Re: Data Security Officer by philip.paradis · · Score: 1

      Completely absent additional information, I'll give you another hint on why deterministic assignment is a very bad choice here, representing a practice in total opposition to OTP: curve fitting. Is this starting to make a little more sense now?

      --
      Write failed: Broken pipe
    61. Re:Data Security Officer by Anonymous Coward · · Score: 0

      Too many governments and corporations continue to fail to understand that it requires having experts who actually know what they are doing be in charge of data security.

      How do you know this isn't by design? I have always been skeptical when government or corporations say "It's okay, our tracking data on you is 'anonymized' before it is sent to us." I've always just assumed that means they can easily reverse whatever algorithm is being used to get back the original data.

    62. Re: Data Security Officer by Anonymous Coward · · Score: 0

      Come on, it's kinda flattering that you modded this comment of mine up, but does none of you realize that you can actually use one big enough static salt and achieve what is needed?

    63. Re: Data Security Officer by Anonymous Coward · · Score: 0

      You should be ashamed at how badly you just failed. A substitution of the ids on the list with arbitrary sequence would be sufficient. Don't lecture people on crytpo when you seem to know nothing about it yourself.

    64. Re: Data Security Officer by fulldecent · · Score: 1

      This is correct. MD5(salt + data). Salt is same for EVERY MD5 operation. Create the file and then delete the salt, done. This is called keying.

      --

      -- I was raised on the command line, bitch

    65. Re: Data Security Officer by Anonymous Coward · · Score: 0

      Dude, you are wrong on this one, you need to let this one go.

    66. Re: Data Security Officer by philip.paradis · · Score: 1

      You clearly have no idea whatsoever what a one-time pad is. Reference my other comments in this thread for additional hints as to why msauve's error is particularly egregious in this context. Alternately, stay ignorant. Your choice.

      --
      Write failed: Broken pipe
    67. Re: Data Security Officer by philip.paradis · · Score: 1

      Dude, msauve's proposed methodology is indeed tragically flawed, and you clearly haven't read the balance of the posts in this thread. Why are you so resistant to refutation of bad crypto advice? Are you positioned to benefit from deterministic systems which are advertised as cryptographically sound?

      --
      Write failed: Broken pipe
    68. Re: Data Security Officer by philip.paradis · · Score: 1

      You're clearly a fan of "get one key, get 'em all." Who signs your paychecks these days?

      --
      Write failed: Broken pipe
    69. Re: Data Security Officer by ultranova · · Score: 1

      You must have missed the motherfucking literary reference I linked. Read the fucking book (and hopefully a few more), you fucking retard.

      The book you linked to is about cryptography, not incest pornography as you seem to be implying. Neither of these seems relevant to anonymising - as opposed to encrypting - records.

      Good luck with treating your Tourette's, BTW. Or hangover. Whichever is relevant here.

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

    70. Re: Data Security Officer by philip.paradis · · Score: 1

      Are you confirming shipment of the book (along with a couple of other volumes) to Delft University of Technology in your care? I found it odd that even an undergraduate at such an institution would not already have access to such material, but perhaps all university copies are already on loan to other students. As an aside, you appear to be lacking the capacity to distinguish emphasis borne of extreme frustration from certain pathological afflictions. You should work on that.

      --
      Write failed: Broken pipe
    71. Re: Data Security Officer by philip.paradis · · Score: 1

      By the way, thanks for the added laughs per your attempt to reframe this discussion as "anonymising" versus "encrypting." You'd get a few charity points for sophomoric debate tactics if the subject matter were a bit less serious in nature, but that particular bit of commentary is indeed nothing more than a juvenile attempt at diverting attention from the matters at hand. Try again.

      --
      Write failed: Broken pipe
    72. Re: Data Security Officer by Anonymous Coward · · Score: 0

      So many posts. So little wisdom. Such rage. Many laughs.

      signed, your #1 anonymous coward. ...oh shit, I used a sequential number there instead of a random number drawn from a proper entropy-pool. Now, by your own logic, I am no longer anonymous! Crap!

      =P

    73. Re: Data Security Officer by Anonymous Coward · · Score: 0

      What is it with you and shipping hardcopies? This is one of the weirder trolls I have seen.

    74. Re:Data Security Officer by TechnoJoe · · Score: 0

      I worked for a very long time in government. And I learned one thing: You are not supposed to know shit. You are supposed to buy knowledge.

      If the government agents doing the buying don't know sh!t, then how do they know if they're purchasing good knowledge?

    75. Re:Data Security Officer by Opportunist · · Score: 1

      I don't know about your government, in mine, there's a process and proscribed procedure for everything. I'm fairly sure there's even a defined procedure how to correctly pass gas.

      And hence there is of course a procedure for hiring. You'd actually be surprised how efficient bureaucracy can be at inventing ways to make itself indispensable. If you don't know who to hire, hire a guy to tell you who to hire.

      I am not kidding.

      --
      We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
    76. Re: Data Security Officer by msauve · · Score: 1

      Keep digging that hole, and someday it will be large enough to put your head in.

      --
      "National Security is the chief cause of national insecurity." - Celine's First Law
    77. Re: Data Security Officer by Sanians · · Score: 1

      That assumes that the salt is as trivially brute-forced as the license and medallion numbers. The reason this data could be brute-forced was because there's only so many possible license plate numbers, and that that 'many' is easy work for a computer. A proper salt would be as many bits as the hash itself, but computing 2^128 hash values requires more CPU time than anyone has.

      That said, a hash is an overly-complex solution to this problem. Just take all the plate numbers, randomize them in a list, then just output their position in the list. "Plate #415" isn't going to be decoded into "HQD 1853" no matter what you do.

    78. Re: Data Security Officer by Sanians · · Score: 1

      salt + "1234", if you know the "1234" then its a tiny brute force to get the salt

      Really? I've chosen a salt consisting only upper & lower-case letters and digits. I then processed it like this:

      echo -n "salt1234" | md5sum

      The resulting MD5: e6f23ea50a901510fda62e4319e726ba

      So, what's my salt?

      It's even that puny MD5 hash that everyone keeps saying is broken (not that it isn't) so this should be easy for you.

    79. Re: Data Security Officer by Zaelath · · Score: 1

      Yeah, no. You're wrong, though entertaining.

      Tiny Key Space: Bob, Alice, Claire
      Anonymised Key List: A, B, C

      Resultant Data:
      A travelled between points X and Y
      B travelled between points P and Q
      C travelled between points Q and Y
      A travelled between points Y and Q
      C travelled between points Y and Q
      A travelled between points Q and P

      I maintained the hash table in memory long enough to know which person is which so that you can determine A travelled from X to Y to Q to P, B from P to Q, and C from Q to Y and return. But there is not enough data to know who A, B, or C are. And no, A != Alice, B != Bob, C !=Claire.

      The original OTP proponents point is that you can't recreate the algorithm to convert from Name to Hash, and since this is anonymisation and NOT password management, you don't need to. /hands you back the shovel

    80. Re: Data Security Officer by philip.paradis · · Score: 1

      You still don't seem to understand. Maybe it will help you to recall that the input data is thoroughly non-uniform and deterministic in nature. This point was conveyed in the summary, ffs. The anonymization method asserted by msauve and errantly supported by others (yourself included) spectacularly fails to account for this fact, and bears no resemblance whatsoever to a sound OTP implementation. "You're going the wrong direction, shipmate."

      I'm rather glad we didn't have folks like you leading the charge at Bletchley Park from 1939 onward, as things might have consequently turned out more poorly for the Allied powers. On the other hand, you would have fit right in keying Enigma machines.

      --
      Write failed: Broken pipe
    81. Re: Data Security Officer by philip.paradis · · Score: 1

      Minor correction to the above post: "non-uniform" was intended to be "non-entropic." It's late here.

      --
      Write failed: Broken pipe
    82. Re: Data Security Officer by philip.paradis · · Score: 1

      Throughout this conversation, I've been patiently waiting for someone to realize there's a lot more correlating data available in plain sight than anyone is owning up to. Provided that realization is made in the first place, the ensuing thought experiment should rapidly progress through probability, curve fitting, and rote process of elimination in a key space drastically reduced from even the space represented by the raw medallion search space.

      If someone else, anyone else, would bother to think about this for a few moments, they might just arrive at a deeply uncomfortable conclusion: some data sets cannot be properly anonymised at all. Put another way, engineering a cryptographic solution in a vacuum is a lot like gasping for breath in outer space: you can perform actions you are utterly convinced are perfectly valid, but owing to context the end result is going to be highly unpleasant.

      This is why we can't have nice things, specifically things involving sane public policy regarding privacy. Regardless of how the voting populace and their elected representatives might desire to craft policy in one direction or another, fundamental lack of understanding of the underlying environment and its rules of operation implies a necessary disconnect between intent and outcome.

      This is why people need to study formal reference materials and think about things before they make recommendations, and it is why large scale intelligence outfits will continue to trump those under observation. Tunnel vision is a motherfucker.

      --
      Write failed: Broken pipe
    83. Re: Data Security Officer by Zaelath · · Score: 1

      Your point holds if say, the cab driver's home address is listed as one of the data points, since that's personally identifying.

      So if you're saying the point that "you can't convert a OTP back to the original data" is moot, then you're arguing a different position to everyone else in the conversation. The original article was entirely about being able to reverse the hashing algorithm.

    84. Re: Data Security Officer by philip.paradis · · Score: 1

      Thank you for the first reasonable reply I've received throughout this thread. You've caught the gist of part of what I'm hoping to illuminate here (which is probably far more important in the larger scheme of things), but you haven't seen the full picture yet. I have a challenge for you. Using your own line of reasoning as a premise to be challenged, can you analyze it from an adversarial perspective and develop a proposal for how additional inferences might be made regarding unique identification of medallions in the event that each medallion has been replaced with an arbitrary token? In your deliberations, please consider every facet of the reported data. It's quite apparent that those who have replied to my comments in this thread either (1) haven't directly considered the data themselves, or (2) lack the insight required to observe relationships between apparently unrelated constructs.

      In short, under this challenge, I can deliver ~90% of the medallion identifiers using no external information other than full knowledge of the means by which the original medallions are assigned. Given a tiny parcel of additional correlation, I can hit 100%.

      I look forward to your reply. By the way, what do you do for a living at the moment?

      --
      Write failed: Broken pipe
    85. Re: Data Security Officer by Zaelath · · Score: 1

      I don't have a sample of the full dataset, or really the time to get/assess it fully :) If I was going to hazard a guess, your method would be closely related to timestamps on the data?

      I think your assertion is quite possible, but it involves a lot more work and third party data sources to correlate back to otherwise properly anonymised IDs than the fairly pedestrian realisation for a 100% result in the source article.

      Regarding work, short answer is probably; DevOps in a company that spends a lot of time thinking about security ;p

  4. This by Anonymous Coward · · Score: 0, Insightful

    This is why we can't have nice things.....

  5. That was a dumb thing to do. by K.+S.+Kyosuke · · Score: 1

    Cue a CFAA trial and a long stay in a cozy federal PMITA penitentiary.

    --
    Ezekiel 23:20
    1. Re:That was a dumb thing to do. by Opportunist · · Score: 1

      And the crime would be? Exposing government stupidity?

      --
      We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
  6. Prediction: de-anonymization considered "hacking" by rsborg · · Score: 5, Insightful

    Large organizations will consistently fail to hire/staff competent people for data security related issues, and will push back on fines or punitive findings by criminalizing publicizing their incompetence.

    Thus sending all such talent straight to criminals who'll be happy to reward them with hard cash.

    It's like these guys _want_ a dystopian future.

    --
    Make sure everyone's vote counts: Verified Voting
  7. What's the issue here? by Anonymous Coward · · Score: 0

    People will know driver XYZ drove from 122 Main St to 123 Second St?
    It's not like they have the info on where the person was actually going when they got out of the cab.
    This isn't even an issue. *yawn*

    1. Re:What's the issue here? by gweihir · · Score: 4, Insightful

      You are naive. The problem starts to crop up when you start correlating things. Then you can find all sorts of things, like patterns of visiting a mistress, people meeting in secret (which is perfectly legal, but the government fears it), etc.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    2. Re:What's the issue here? by Opportunist · · Score: 5, Insightful

      Actually the movement of a cab is a wealth of information. Not by itself, but it's very good at connecting dots. If you want to follow someone around, these things tend to be invaluable. You can, essentially, follow someone around without following them around, even retroactively. People rarely go from place to place randomly. They have destinations. If someone takes a cab from the airport and doesn't live in the area where he landed, it is likely that his destination is the place that he will stay in. After a flight, especially a long one, people want to get rid of their heavy baggage, take a shower, put on new clothing. So you can easily find out where someone stayed. Which becomes twice as interesting if the destination is not a hotel, because now you got another person to screen.

      This information by itself is not much. But as part of a bigger network it is something we'd have killed for back when I was still doing profiling.

      --
      We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
    3. Re:What's the issue here? by Anonymous Coward · · Score: 0

      Yeah I could see this, if this information included the name of the passenger. It's only the cab information, origin and destination. Unless I'm reading this wrong it's not like it says Joe Sixpack got into cab #123 at such and such street and got off at this other street at this time. I don't see how you'd get any useful information out of this. And if you could, who would care? It's not like it has the passengers payment information like CC info or drivers license number or whatever... doesn't seem to be an issue. Why even go through the trouble of trying to follow someone with this info when if you really wanted to it'd be much easier other ways I'm sure. Still seems like a non issue to me.

    4. Re:What's the issue here? by AHuxley · · Score: 2

      Very insightful Opportunist .
      With more nations trying to count passports in and out a wealth of information about each person entering some countries is now been stored.
      From face recognition, gait analysis, 'free' wifi, a new/old phone been set up for cheaper local use, the random risk of a laptop been examined and cloned on entry and exit.
      If you want to rent a car you face a complex 'chat down' by the friendly on site rental staff.
      So you take the next random taxi.
      In the past along a long airport road the interaction of a few tailing vehicles might be detected given the number of turns into a city.
      Destinations can be looked at over time, in near real time and as a history.
      That first trip can open up a world of new digital 'hops' - old friends, college buddy, lover, extended family, until now unknown associate to having their lives been examined too.
      If you go to a hotel you face another 'chat down' attempt by the friendly staff over a long complex CC or cash transaction.
      No follow car pool or beacons needed anymore just go big, local and federally with “collect-it-all” :)

      --
      Domestic spying is now "Benign Information Gathering"
    5. Re:What's the issue here? by AHuxley · · Score: 2

      Has Joe Sixpack been seen near any anti war protests? Written to the press at a city, star or federal level? Given charitable contributions to a faith based group now under investigation? Have a security clearance? Have a family member with a new or old security clearance? Does Joe Sixpack travel outside the USA a lot?
      Its not just about been "much easier" its about getting it all, having domestic staff feel ok about storing and sorting domestic details per person, been able to legally collect more domestically without needed per person court work.

      --
      Domestic spying is now "Benign Information Gathering"
    6. Re:What's the issue here? by Opportunist · · Score: 2

      The point is that you can't follow every Joe Random around all the time. But occasionally some Joe Random becomes a Joe Someone and you just wish you had the information that you could have if you just followed him.

      Scenario.

      You find out that there is someone you deem a nuisance to the powers that are. You finally caught him. But he doesn't talk. Imagine you're an entity that has access to a lot of information, either directly (because you have it) or indirectly (because you can request it). Using the CC information of your subject you find out that he recently spent time in another city (because you get the flight information). Since there is no other reason (like, say, business reasons), and since his travel visa says "vacation", you deem it likely that he met a contact or even an accomplice. You have no hotel bills on CC, so either he paid in cash or, and this is what you hope for, he stayed with his contact.

      You know when his plane landed and you can even determine to some degree of certainty when he left the airport (you may even have access to the CCTV to pinpoint the moment). Of course more than one taxi leaves around that time, but most of them go to hotels (that you can then check out for reservations by the name of the person you're looking for). What you're really hoping for is a private address. And unless your subject was very careful, he might even have given the cab driver the real address, which now offers you another address and another contact to use.

      Next thing you want to do is find out all cab movements to and from this address. It may be some kind of "hub" for people of that particular kind of nuisance, you may actually find some kind of structure. You can at least find out whether your subject also took cabs to other destinations and when, how often and where he went.

      Or how about a more general approach? You could use the information to find out whether some private address gets visited by people from outside of town suspiciously often. What do they do there? Why do they go there? Do they stay there? If not, what could they be doing there?

      Cabs offer a wealth of information. Again, by itself that information is fairly useless, but it is great for "connecting dots", because that's what cabs do: They move from point A to point B with their passenger.

      --
      We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
    7. Re:What's the issue here? by chriscappuccio · · Score: 4, Insightful

      The government has the info already, they handed it out!

    8. Re:What's the issue here? by dcw3 · · Score: 1

      I would love to have a glimpse, at this. I bet we'd be able to find some hacks who frequently take extended routes to bump up their fares.

      --
      Just another day in Paradise
    9. Re:What's the issue here? by gweihir · · Score: 1

      And the Government is the only party that does data-correlation?

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    10. Re:What's the issue here? by Anonymous Coward · · Score: 0

      You misunderstood the TFA. The data is on the drivers and the cabs, not the passengers. They don't know who took, they just know that Cabbie 1234 didn't pick up anyone for 1.5 hours on Monday morning.

  8. Go directly to jail by AndyKron · · Score: 0

    Now you must go to jail Sorry :-(

  9. Oops, indeed by Krishnoid · · Score: 4, Funny

    Software developer Vijay Pandurangan did just that, and in less than two hours he had completely de-anonymized all 173 million entries.

    Having thereby run afoul of the circumvention of copyright protection mechanisms clause of the Digital Millenium Copyright Act, he was then subjected to the NYPD's controversial new program, and subsequently incarcerated.

    1. Re:Oops, indeed by Anonymous Coward · · Score: 0

      For a moment there, I thought you said incinerated.

    2. Re:Oops, indeed by Anonymous Coward · · Score: 0

      You can't copyright lists of facts, therefore DMCA doesn't apply.
      I'm sure they'll find something else specious though.

  10. Give that man a Big Gulp! by Anonymous Coward · · Score: 0

    Wait. It's NY city. We can't do that.

  11. Cue the DMCA. by MickLinux · · Score: 1

    Oops.

    --
    Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
  12. Re:Prediction: de-anonymization considered "hackin by Anonymous Coward · · Score: 0

    Target's breach cost them 50% of their revenue for a year.

    That nearly put them out of business.

    The meeting between them and the card carriers went something like
    AMEX_Discover_MasterCard_VISA: You are paying to replace cards, paying for the fraud from the compromise (10-15 billion\year), and you are paying enhanced fee's for several years to us until you proove you are again trustworthy of the normal rates we give your competitors.
    Target: And if we refuse?
    AMEX_Discover_MasterCard_VISA: We will choose not to do business with you. Your customers will have to buy in cash.
    Target: Oh...well then. Where do I sign?

    As systems become more integrated, Data Security is going to become less about keeping egg of of your face and more about corporate and personal survival. Those old movies from the 80's with the hacker causing the elevator to drop 100 stories, or the cellphone battery to explode, or the factory to go out of business for months on end, and so on?

    The industry is already moving towards true security.

    When data starts being used against people personally, they will begin asking questions, and then it will become very important.

    Until then, if you're a hacker and know your shit, enjoy being God.

  13. Error so popular it was enshrined in PCI DSS by WaffleMonster · · Score: 5, Insightful

    Always assumed anywhere term "anonymized data" is used it is more likely than not to be companies and governments paying lip service to its customers... where data could easily be reversed into an identifiable way by either taking advantage of insufficient entropy or cross referencing datasets.

    There is after all no cost for violating privacy or unnecessary risk exposure associated with disclosure.

    One of my favorite examples of dangers of insufficient entropy stem from a PCI DSS requirement written by "experts" who should know better.

    3.4 Render PAN unreadable anywhere it is stored (including on portable digital media, backup media, and in logs) by using any of the following approaches:

    One-way hashes based on strong cryptography, (hash must be of the entire PAN) ...

    Search space of typical 16-digit card numbers is no match for a modern CPU once you have taken check digit, card type, issuer and issuer specific numbering into account... "strong cryptography" can't fix stupid.

    1. Re:Error so popular it was enshrined in PCI DSS by Anonymous Coward · · Score: 0

      >There is after all no cost for violating privacy or unnecessary risk exposure associated with disclosure.

      Except for healthcare.

    2. Re:Error so popular it was enshrined in PCI DSS by gweihir · · Score: 1

      Indeed. Any reversible transformation for a small-entropy source set is insecure. Anybody that actually understands crypto knows that. Seems this mess is just one more indicator that some people hire far too cheap when it gets to IT.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    3. Re:Error so popular it was enshrined in PCI DSS by swillden · · Score: 1

      Always assumed anywhere term "anonymized data" is used it is more likely than not to be companies and governments paying lip service to its customers... where data could easily be reversed into an identifiable way by either taking advantage of insufficient entropy or cross referencing datasets.

      It's worth mentioning that one possible solution in this sort of situation is to use a keyed hash. Assuming a good base hash (which MD5 really isn't, any more, but HMAC MD5 would likely have been fine) and a well-secured key with sufficient entropy, it is infeasible to reverse the hash. Cross-referencing may still be an issue, though straight brute force reversing of the hashing isn't. To eliminate the possibility of cross-referencing it's necessary to use a different hash key for each database.

      Of course, like all cryptographic "solutions", this merely replaces a large secret (the contents of the database(s)) with a small secret (the key or keys). Still, it's typically easier to secure a key than a database. "Easier" doesn't mean "easy". Depending on the application, though it's often the case that if all you need is unique IDs for delivery to a third party, you can just generate a random key, use it to hash all of the to-be-secured IDs then discard the key.

      Oh, and the real "solution", of course, is to hire someone who knows what they're doing and give them the time and resources to fully and accurately understand the security problem they're trying to solve. They'll either do the job or tell you it can't be done (or do the job and screw it up in a subtle and non-obvious way rather than a stupid and obvious one... but hey, at least if it's broken it'll be subtle and non-obvious break).

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    4. Re:Error so popular it was enshrined in PCI DSS by Wrath0fb0b · · Score: 2

      Um, the standard is fine. The phrase "One-way hashes based on strong cryptography" means (to any professional in the business) that one must salt the hash with sufficient entropy to make brute-forcing the input space impossible. So 16 digit CC has little entry, but add a 16-byte hash and you've somewhere.

      So yeah, "strong cryptography" can't fix stupid, but those that know how to use it are plenty fine.

    5. Re:Error so popular it was enshrined in PCI DSS by Anonymous Coward · · Score: 0

      if all you need is unique IDs for delivery to a third party, you can just generate a random key, use it to hash all of the to-be-secured IDs then discard the key.

      Thereby introducing a known plaintext into a cryptographic construct--something not to be taken lightly. In that instance, actually worse than useless. If all you need is unique IDs, a random sort and cardinal numbering is better (no known plaintext introduced to later convey extra information like license plate # in the event of a break). Random IDs would work just as well.

      There's an incredible amount of bad homebrew crypto suggestions on this article. Play with it, learn the principles, but please, for the love of FSM, don't trust it.

    6. Re:Error so popular it was enshrined in PCI DSS by WaffleMonster · · Score: 1

      Um, the standard is fine. The phrase "One-way hashes based on strong cryptography" means (to any professional in the business) that one must salt the hash with sufficient entropy to make brute-forcing the input space impossible. So 16 digit CC has little entry, but add a 16-byte hash and you've somewhere.

      This is the second time 'use salts' has been mentioned. Salts are not secret keys and only provide protection against creation of lookup tables to accelerate brute force of multiple items... they in no way address the underlying problem of insufficient entropy.

      I don't know the exact figure last I looked into this space of every possible credit card that can be issued across all currently known issuers is well less than a trillion most likely in tens to hundreds of billions range... practically free by today's hardware standards.

    7. Re:Error so popular it was enshrined in PCI DSS by Buzer · · Score: 3, Interesting

      Salts do provide protection against that. Salts are secret if you want them to be (you can protect the plain text salt same way as you do protect your plain text keys for encryption), you only need to share them when other party has to be able to hash their original data.

      Here are some sha1 hashes:

      • 4c2199828f355281e0f6eccb76d9df609f99ed0e salt+"123"
      • 458183225b77f6baff7c4c439b0ed3a5e7278e8a salt+"456"
      • ed974fc96c530639cccc9b18315396789d93a697 salt+"789"
      • f87a2fa039a20d01032f19b5852868343f3d06b9 salt+"???"

      So, how about you tell me what that last number combination is? I can give you a hint that it matches regex /^[1-9]{3}$/ (so there are only 729 possibilities). The salt is 60 character string. If you cannot do it, then OPs post was correct.

    8. Re:Error so popular it was enshrined in PCI DSS by swillden · · Score: 1

      Thereby introducing a known plaintext into a cryptographic construct--something not to be taken lightly.

      You don't know what you're talking about.

      First, with any decent cipher or keyed hash, known plaintext by itself poses no risk to security. If it does, then by definition your cryptographic construct is pre-broken and you should get another one that works. Encryption should nearly always be randomized not because known plaintext is a problem, but to avoid replay attacks. That's not relevant here, in fact "replay" is a desired feature since the whole point is to produce IDs which can be correlated within a defined context.

      Oh, it's wise to avoid known plaintext when convenient, on the theory that your fundamental algorithm may become broken in the future. But it's better to plan for algorithm agility in that case... and, frankly, it's a corner case.

      If all you need is unique IDs, a random sort and cardinal numbering is better (no known plaintext introduced to later convey extra information like license plate # in the event of a break). Random IDs would work just as well.

      If you have the option of doing that, fine. But in many cases the data is streaming in and you can't stop the world to sort and assign IDs. So to use a random ID you'd have to store a mapping of real value to random ID and look up the appropriate random ID -- or add a new one -- for each item that comes in. A keyed hash is much simpler and faster.

      There's an incredible amount of bad homebrew crypto suggestions on this article. Play with it, learn the principles, but please, for the love of FSM, don't trust it.

      There are always lots of bad suggestions, which is why I offered a good one.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    9. Re:Error so popular it was enshrined in PCI DSS by swillden · · Score: 1

      Salts are secret if you want them to be

      If you keep them secret then they're not salts, they're keys. The definition of "salt" in the cryptographic world includes the notion that it need not be kept secret, just as "IV" is a value which need not be secret but must not be predictable, and "nonce" is a value which need not be secret or predictable (indeed a salt is technically a form of or application of a nonce).

      Another characteristic of salts is that you use a different salt for each entry. That's counterproductive in the case being discussed, because the whole point is to compute the same value for a given driver ID, so that you can compute statistics across that driver's logs (but without knowing who the driver is). So to use per-entry keys, you actually have to store a mapping from ID to key so you can look up the correct key each time you encounter a given ID in the obscuring process. If you're doing that you might as well not bother with hashing at all: just pick random values for each driver ID and use the random value in place of the ID as the "obscured" form. Since there will be no link between random IDs and the driver IDs they substitute except for your mapping table, it's impossible for anyone without the table to reverse the mapping.

      There's nothing wrong with using a keyed hash, but it's better to use a proper keyed hash construction, like HMAC; salt is generally employed simply by hashing the salt with the value to be obscured, which enables various attacks depending on whether the salt is first or last.

      So the right way to do hash-based obscuring in this context is to pick a random secret key and use it to HMAC each of the driver IDs. You'll get the same value for each driver ID, but only someone with the HMAC key can exploit the small driver ID space to discover the mapping.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    10. Re:Error so popular it was enshrined in PCI DSS by Anonymous Coward · · Score: 0

      In this thread: a 1/1000 random chance of blowing Buzer's mind

    11. Re:Error so popular it was enshrined in PCI DSS by Wrath0fb0b · · Score: 1

      Yes, which is exactly what the person in this article actually did -- he created a lookup table to accelerate brute-forcing the entire released dataset.

      And yes, there are a trillion credit cards. But if each one gets a random 32-byte salt added to it, then that's a 4-billion-trillion input space ...

    12. Re:Error so popular it was enshrined in PCI DSS by Wrath0fb0b · · Score: 1

      Yes, a secret salt is no salt at all.

      But there are very important uses for salting that make it better than assigning a random number -- it allows someone that does know the input value look up the relevant entry without any involvement from the secure side.

      Imagine you had the following two datasets that you've partitioned:

      Private: { Credit Card Number, Random Salt }
      Public: { H(CC+Salt), Amount of money spent on porn, Amount of student debt }

      Now whenever you want to obscure an entry, you do need to go to private one. But if you want to answer the question "How much money did a person with CC X spend on porn", you can look it up without entering the secure domain. But no one without access to the private side can find credit cards in the DB or other stuff -- to within the computational costs of the operation multiplied by the entropy of the salt.

    13. Re:Error so popular it was enshrined in PCI DSS by swillden · · Score: 1

      if you want to answer the question "How much money did a person with CC X spend on porn", you can look it up without entering the secure domain.

      In order to do this you need to know the salt, so I'm assuming it's in the public database as well, i.e.

      Public: { H(CC+Salt), Salt, Amount of money spent on porn, Amount of student debt }

      no one without access to the private side can find credit cards in the DB or other stuff -- to within the computational costs of the operation multiplied by the entropy of the salt.

      Anyone able to do the first operation (looking up an entry for a given CCN) can recover all of the CCNs by brute forcing the CCN space, which isn't all that big. It's not clear what you meant by "+" in "H(CC+Salt)". If you meant concatenation, then you've provided the attacker with a nice way to speed up the search. If you meant XOR, or even addition... hmm... it's not obvious if there's an optimization in that case. If you used a proper keyed hash, like HMAC, then the attacker must do the full operation for each candidate CC and (known) salt. Still, that is well within the realm of possibility, and it doesn't depend at all on the size of the salt space.

      If the salts are not public, then they're not salts, they're keys, and it's necessary to possess the proper key in order to look up an entry for a given CCN. And, of course, anyone with the secret knowledge can brute force the CCN space, so you may as well just give them access to the private database.

      I don't think your scheme accomplishes what you think it does.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    14. Re:Error so popular it was enshrined in PCI DSS by Wrath0fb0b · · Score: 1

      Yes, you are right, I mistyped.

      Public: { H(CC+Salt), Salt, Amount of money spent on porn, Amount of student debt }

      [ where + is just shorthanded for "mixed with" ]

      It's not at all within the realm of possibility for an attacker to brute force the CC space for each salt separately. So yes, an attacker can run through (2**CC_entropy) hashes to brute force a single entry, but that exercise provides him no help when he goes to do the next entry. Moreover, he can't spin up a few TB of storage on S3 and pre-compute anything useful.

      The point of the scheme is to turn a pwn-once-win-forever game into a pwn-one-win-one game. This guy paid once and won the entire database. I would like him to have to pay that cost once for each entry.

    15. Re:Error so popular it was enshrined in PCI DSS by WaffleMonster · · Score: 1

      Salts do provide protection against that. Salts are secret if you want them to be

      You are playing word games. A "secret salt" is a "key" not a "salt" while clearly ignoring relevant context of PCI DSS requirements.

      Other relevant bullet items in sec 3.4 were:

      * Index tokens and pads (pads must be securely stored)

      * Strong cryptography with associated key-management processes and procedures.

      If that is not enough of a hint to understand what they are talking about when they say one-way hash the "Note" section spells out exactly what they mean.

      Note: It is a relatively trivial effort for a malicious individual to reconstruct original PAN data if they have access to both the truncated and hashed version of a PAN. Where hashed and truncated versions of the same PAN are present in an entity's environment, additional controls should be in place to ensure that the hashed and truncated versions cannot be correlated to reconstruct the original PAN.

      If a huge "secret salt" is expected then truncation warnings would be irrelevant.

    16. Re:Error so popular it was enshrined in PCI DSS by swillden · · Score: 1

      It's not at all within the realm of possibility for an attacker to brute force the CC space for each salt separately

      Sure it is. With a couple dozen GPUs you can do 10^14 SHA-256 hashes in a little over two hours. The most cost-effective option, though, is probably to modify the Butterfly Labs bitcoin miner FPGA. One of those can search the space in a little over an hour. The ASICs are much faster but you couldn't use a bitcoin miner unmodified, and ASICs can't be easily tweaked.

      I suppose maybe my perspective is a little skewed, because I've been doing CC secure storage at Google for the last three years. My normal supposed adversary is a malicious Google engineer... and Google engineers have massive computational resources available to them. Any one can fire up a job on tens of thousands of machines -- many of them with GPUs -- as long as they're willing to set the priority very low. So I think of the CC space as small. It's a little harder for most people than my notional adversary... but it's really not that much harder.

      The fact that CCNs have such low value (because they're so easy to get), probably does make your scheme workable. If you brought your design to me in a security review, though, I'd kill it. The attack is hard but feasible now, and getting easier all the time.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    17. Re:Error so popular it was enshrined in PCI DSS by swillden · · Score: 1

      One other point: If you modify your scheme to use a tunable slow hash, e.g. scrypt, then I would give it a thumbs up.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  14. Where is the harm by Anonymous Coward · · Score: 0

    why did NYC attempt to hide the data in the first place?

    1. Re:Where is the harm by PPH · · Score: 0

      Probably some union rule prohibiting the compilation and/or publication of driver's performance records. It's all seniority.

      --
      Have gnu, will travel.
  15. Vijay Pandurangan arrested by Anonymous Coward · · Score: 0

    Surely Vijay Pandurangan will not be arrested for hacking?

    1. Re:Vijay Pandurangan arrested by wiredlogic · · Score: 1

      Hacking? This man is obviously a terrist fer'ner. Get him to Gitmo in a rendition wagon ASAP.

      --
      I am becoming gerund, destroyer of verbs.
  16. MD5 is not the problem by gweihir · · Score: 1

    For this application, MD5 did not make a difference. SHA512 would have been just as insecure. For some applications, MD5 is perfectly secure if used competently. This example is one and the original story doe snot claim any culpability on the part of MD5. As always, there is no substitute for knowing what you are doing.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  17. Re:Prediction: de-anonymization considered "hackin by Opportunist · · Score: 5, Interesting

    True that.

    I am in the fortunate situation of having near unlimited funds. I was joking that I need a rubber stamp labeled "for security reasons", because whenever I want something, these three magic words will brush aside nearly all objections (ok, within reason, but anything 5 digits or less is nearly certainly mine if I "rubber stamp" it that way).

    The most recent draft of the security procedures I did I peppered liberally with "insanity" as I call it. It's a political thing. You demand stuff that you don't really want but is so terribly obstructive to everyone else that they'll agree with what you actually want just to get the insane levels of "security" (read: obstruction and red tape) out of the way. To my unending horror (and slight amusement) they signed it off without changing a comma. Now find out how to argue why you want your own requirements out of the crap...

    The reason isn't that our board suddenly found out how much they love security or how important the confidentiality of the (considerably sensitive, I should add) private data we hold here is. What changed is simply that our government upped the fines and punishment for data breeches considerably, up to and including jail time for board members if negligence can somehow be tacked to them. In a nutshell, unless you can show that you tried to stay on top of security when holding highly sensitive data, you should prepare to take a longer vacation, all expenses paid, in a holiday resort of your government's choice.

    I guess when your ass is on the line, you get very willing to spend money.

    --
    We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
  18. I de-anonymized this comment by ewg · · Score: 1

    I de-anonymized this comment by signing in.

    --
    org.slashdot.post.SignatureNotFoundException: ewg
  19. Using a published hash - FAIL by chromaexcursion · · Score: 1

    Using any public hash exposes you to dictionary attacks. Especially when you publish which one you've used.
    The quality of the encryption is irrelevant.
    Security through obscurity, using a custom algorithm, is the only way.
    Taking MD5, it's published, and tweaking a few points (though who ever did this needs to be very competent) would have been sufficient.

    Some manager probably said any work for addition security wasn't worth the cost. Ooops!

    1. Re:Using a published hash - FAIL by PPH · · Score: 2

      Security through obscurity, using a custom algorithm, is the only way.

      Not necessarily. I imagine the reason the hashed field was included in the published logs was to provide a key to group results by driver. Even if that driver was to remain anonymous. So all the city would have had to do is issue a system generated UID for each medallion/license number combination and populate the published data with that.

      Nobody knows who driver 1, 2, 3, .., 736903, ... etc. are. But one can still analyze per-driver data.

      --
      Have gnu, will travel.
    2. Re:Using a published hash - FAIL by Vellmont · · Score: 3, Interesting

      Taking MD5, it's published, and tweaking a few points (though who ever did this needs to be very competent) would have been sufficient.

      No, that would have been stupid. It's unlikely someone would have reverse engineered your hacked md5 algorithm, but it's also possible you could screw it up.

      The solution is VERY simple. Generate a random 256 bit string. Hash random-string+data, and use the output as the identifier. Throw away the random 256 bit string.


      Some manager probably said any work for addition security wasn't worth the cost. Ooops!

      No, some developer didn't know what the hell they were doing. You'd be surprised (but shouldn't be) how little most developers know about security, especially encryption.

      --
      AccountKiller
    3. Re:Using a published hash - FAIL by chromaexcursion · · Score: 1

      nope, it has to do with the key. given a tag # and license # you can dictionary attack the hash. especially since the the source data is known, easy to break.

      they didn't pre-anonamize the keys

    4. Re:Using a published hash - FAIL by chromaexcursion · · Score: 1

      well, you just described a way to tweak an algorithm.
      wouldn't even have to go to a 256 bit key. Doing that into MD5 would probably foil anything less than a concerted financial attack.
      No media outlet could afford the computing power to attack that.
      I used the same approach, with some further tweaks to secure financial communications a decade ago.

      Lack of understanding security doesn't surprise me. I'm an engineer who does. I designed and wrote a suite that passed a 3d party, hostile, security audit.

    5. Re:Using a published hash - FAIL by Vellmont · · Score: 1

      No, that's not a tweak to an algorithm, it's a random input to an algorithm. The algorithm is the same, the input is different.

      --
      AccountKiller
    6. Re:Using a published hash - FAIL by Anonymous Coward · · Score: 0

      You do realize that hash functions are often used in PRNGs? And you realize that if you can generate 256 bits of random garbage, there's no extra need to hash the original data AND the garbage. You can simply use the 256 bits of garbage if all you need is an identifier. You're talking about adding in known plaintext where none is actually needed. You might hash the 256 bits of garbage to produce better pseudorandom garbage, but that's a secondary consideration. Heck, if we're going to assign identifiers to cabs in the first place, just number them 0-255. It's about as useful... The identifiers are public in the first place. The space of identifiers is public. If the same cab always shows up as 42 in the logs,, even though its real number is 7, you're not obscuring a damn thing or even slowing anyone down (more than trivially).

      I'm really at a loss as to what problem your solution was looking for.

    7. Re:Using a published hash - FAIL by Anonymous Coward · · Score: 0

      Lack of understanding security doesn't surprise me. I'm an engineer who does. I designed and wrote a suite that passed a 3d party, hostile, security audit.

      That could mean something, or it could have been useless if the 3rd party was incompetent, or was lazy in their audit process. You can't stake your claim to be an engineer who understands security on that w/o a bit more evidence...but, maybe you have and just didn't share that.

    8. Re:Using a published hash - FAIL by swillden · · Score: 2

      nope, it has to do with the key. given a tag # and license # you can dictionary attack the hash. especially since the the source data is known, easy to break.

      If they'd used a keyed hash of tag # and license #, it wouldn't have been breakable. Even HMAC-MD5 would have been fine, given sufficient entropy in the key, though I'd have used HMAC-SHA256 just as a matter of good crypto hygiene.

      And a custom algorithm is wrong, wrong, wrong. That's just begging for weakness in the solution. Use the proper standard algorithm for the job.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    9. Re:Using a published hash - FAIL by swillden · · Score: 1

      The solution is VERY simple. Generate a random 256 bit string. Hash random-string+data, and use the output as the identifier.

      This. Except rather than hashing the key with the data, use a proper keyed hash construction. HMAC is a good choice.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    10. Re:Using a published hash - FAIL by swillden · · Score: 1

      You can simply use the 256 bits of garbage if all you need is an identifier.

      Yes, but you need to get the same 256 bits of garbage each time you encounter a given driver ID. This means adding a lookup table. Much simpler and faster to use a keyed hash as your lookup table.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    11. Re:Using a published hash - FAIL by Anonymous Coward · · Score: 0

      Why wouldn't you just replace the identifier with a random string in that case? Why bother with the hash at all?

    12. Re:Using a published hash - FAIL by swillden · · Score: 1

      I designed and wrote a suite that passed a 3d party, hostile, security audit.

      I don't normally play the credential game, but if that's what you want to do...

      Me too. Many times. Including once an audit by the NSA (back when they actually tried to strengthen security). I've also been a security consultant for dozens of fortune 500 companies, and similarly-sized international corporations around the world. I've consulted for the US and Israeli militaries. I'm currently a crypto security engineer at Google, and the lead maintainer of a popular open source crypto library. I'm not a real cryptographer, mind you, since I don't have any published cryptanalytic papers and don't really have the chops to write them... though I do have a couple of colleagues who are prominent academic cryptographers and they listen when I talk. That last point is the strongest credential I can offer, IMO.

      And I'm here to tell you: Do NOT roll your own algorithms, not even by tweaking published ones. Use only published, peer-reviewed, time-tested algorithms and apply published, peer-reviewed, time-tested protocols to construct the security properties you require. The literature and practice includes constructs for virtually any set of security properties you might want (well, the ones which aren't impossible). Learn and use them... and hire someone else with greater expertise to review them. In spite of -- actually, because of -- my decades of experience, I run everything even remotely novel past my real cryptographer colleagues, and they immediately turn to published literature, not trusting their own decades of even deeper experience. And then I have my code carefully reviewed by other engineers with backgrounds similar to mine.

      If I were I to seriously suggest hacking up MD5 for anything requiring security, my colleagues would all wonder if I had been replaced by an (incompetent) alien. Actually, I'd have a hard time convincing them I wasn't joking.

      In this case, what you want is HMAC with a standard cryptographically-secure hash and a key with sufficient entropy.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    13. Re:Using a published hash - FAIL by nabsltd · · Score: 1

      The solution is VERY simple. Generate a random 256 bit string. Hash random-string+data, and use the output as the identifier. Throw away the random 256 bit string.

      How is this any more secure than assigning a the random 256 bit string as the identifier (with collision prevention, of course)?

      Next, how would a random sort of the original keys (SELECT DISTINCT medallion_number FROM the_table ORDER BY RANDOM) followed by assigning 1..number_of_medallions to use as the identifier be less secure?

      As others have stated, you could even just assign the new identifier sequentially if the source table isn't sorted by the key you are trying to obscure.

    14. Re:Using a published hash - FAIL by nabsltd · · Score: 1

      CREATE TABLE id_link (
      new_id INT AUTO_INCREMENT,
      old_id CHAR(50)
      );

      INSERT INTO id_link (old_id)
      SELECT DISTINCT old_id
      FROM old_table ORDER BY RAND();

      SELECT new_id, other_field_1_from_old_table, other_field_2_from_old_table
      FROM old_table, id_link
      WHERE old_table.old_id = id_link.old_id;

      How hard was that?

    15. Re:Using a published hash - FAIL by swillden · · Score: 1

      randomized_id = HMAC_SHA256(id, key)

      How hard was that?

      Not to mention the fact that in some contexts database lookups are prohibitively expensive.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  20. Cue the DMCA. by Anonymous Coward · · Score: 2, Insightful

    In other news, the credentials for their plug-n-play coffee machine are 'admin' 'admin', and their gym locker combo is 1234. Someone made a half-assed attempt to obfuscate some data that nobody cares about (unless your husband's a cheating cabbie, I guess) and someone cracked it. News?

  21. Re:I'm new here! by Anonymous Coward · · Score: 0

    The 'classic', or alpha version is, as the name implies, reserved for the alpha users. All the rest will have to stick with the beta version.

  22. Re:Prediction: de-anonymization considered "hackin by chromaexcursion · · Score: 1

    You've elegantly described why stiff federal penalties are needed.

    Interesting that when a direct line to someone's pocketbook is defined everyone gets on board, but when it's just a chance someone's drinking water would be tainted with cancer causing chemicals most can't find the connection.
    Corporate malfeasance comes in all forms.

  23. Re:Prediction: de-anonymization considered "hackin by Opportunist · · Score: 5, Interesting

    Fines in a corporate world are a matter of risk management: How likely is it that it happens, what's the fine if it happens and how much do we save by not giving a damn? If this unholy trinity comes up with the "don't give a damn" on top, you don't give a damn and the fine becomes part of the operation cost. The more I get to play with C-Levels, the more I get the nagging feeling that I'm the only one weighed down by a consciousness.

    Actually, I think it's more insidious. It's a blame shifting game where everyone can claim he's doing it for the "greater good", because "being bad" is actually "being good". Take the scenario where some people have to be laid off. The floor manager knows them personally. He knows every single one of them, he knows their personal life, their family situation and it really breaks his heart to let one of them go, but he knows he has to. Either he fires one of them or he might have to fire them all because they won't be profitable anymore with the new requirements, and that could lead to the shutdown of the entire branch. His superior may not know the people anymore, but he has to do it because he himself doesn't make that decision, that's been decided further up. He can't simply ignore an order from C-Level. The C's don't need to be psychopaths (though it sure helps, it seems...), they can even be compassionate, but they know that the investors will only keep their money in the company if they perform well and if the cash flow is to their liking. He can easily brush any troubles with his consciousness aside when he fires a few people now, since if he didn't their quarter figures won't look nice, stock would plummet and investors will jump ship, and then he'd have to lay off even more people. But you can't even blame the investment bankers. Because they have to pick the best performing stocks, it's not their money, it's money from investors, money they put aside for their retirement, the investors have a responsibility towards the people that entrust them with their money (ok, recent history shows that most don't give a shit, but let's assume we find an investment banker with a consciousness... it's just a thought experiment, remember). The people investing money don't even know WHAT they invest in, they just toss money onto their investor with the order to "make more of it". And they're not "evil" either, they just want to prepare for their retirement. That people could well be the same that get fired now for the sake of more profit. Essentially, they're firing themselves without knowing it.

    But I ramble.

    What this is supposed to show is that in the corporate world it's easy to play the blame shifting game and use the "but I have to!" excuse. It's sad but it seems the only escape from that game is to actually grab them at the nuts and tell them that they won't be shifting the blame anywhere. And behold, it works.

    Of course that also means that I have to watch my back or it's going to be my ass that's going to jail. But fortunately all I have to do is heed the laws. And that's easy enough, surprisingly.

    --
    We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
  24. Re:Prediction: de-anonymization considered "hackin by superdana · · Score: 1

    data breeches

    bring me my computing pants!

  25. Re:Prediction: de-anonymization considered "hackin by Anonymous Coward · · Score: 0

    You are in rare form. Glad to be here for it.

  26. Re:Prediction: de-anonymization considered "hackin by Anonymous Coward · · Score: 5, Informative

    > Target's breach cost them 50% of their revenue for a year.

    No it did not. Not even close. At worst their profits for the subsequent quarter were down 50% or in terms of revenue, that's less than a 6% drop compared to a year ago.

  27. PCI DSS is to protect bank not customer by Anonymous Coward · · Score: 0

    It has never been about protecting you the customer with the CC, but to give bank & firm a protection against lawsuit or class action in case of massive breach , now they can simply say "hey we were respecting the PCI DSS standard" and be out of the heat. That's why there is no real security, or requirement to have something stronger like a salt hash.

  28. Re:Prediction: de-anonymization considered "hackin by Anonymous Coward · · Score: 0

    Indeed--I suspect he's had a bit much to drink. But it's really quite fascinating...

  29. Re:Prediction: de-anonymization considered "hackin by Anonymous Coward · · Score: 0

    It was a pleasant fantasy and you had to go and spoil it all.

  30. Re:Prediction: de-anonymization considered "hackin by skovnymfe · · Score: 2

    A new car built by my company leaves somewhere traveling at 60 mph. The rear differential locks up. The car crashes and burns with everyone trapped inside. Now, should we initiate a recall? Take the number of vehicles in the field, A, multiply by the probable rate of failure, B, multiply by the average out-of-court settlement, C. A times B times C equals X. If X is less than the cost of a recall, we don't do one.

  31. A little salt might have helped. by brian81 · · Score: 1

    I they would have salted the hash, they may have gotten away with it.

  32. Re:Prediction: de-anonymization considered "hackin by wonkey_monkey · · Score: 1

    A new car built by my company [...] car crashes and burns with everyone trapped inside. Now, should we initiate a recall?

    No, you just need to stop making such shitty cars.

    --
    systemd is Roko's Basilisk.
  33. Re: throwing away the salt by Anonymous Coward · · Score: 1

    The solution is VERY simple. Generate a random 256 bit string. Hash random-string+data, and use the output as the identifier. Throw away the random 256 bit string.

    If you're going to throw away the salt, why not just assign a unique, shuffled identifier for each data string?

    A hash collision could make it look like a single taxi driving in opposite directions simultaneously, or it could cause a pair of day-night shift taxis to appear to be a single taxi that's used 24/7. So if you want to avoid hash collisions, you at least have to verify that none of the values hashed to the same value, and the cost of doing that is roughly the same as the extra overhead of generating a shuffled identifier.

  34. Re:Prediction: de-anonymization considered by Anonymous Coward · · Score: 0

    Looks like that reference went over your head a little...

  35. Re:Prediction: de-anonymization considered "hackin by Opportunist · · Score: 1

    Why? As long as people buy them, there is no pressing need provided that the profit outmatches the potential fines. That's corporate logic.

    What? Oh, people die, yes. That's where the potential fines come into play.

    --
    We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
  36. Calculate the journeys - identify the cheats by Anonymous Coward · · Score: 0

    This is what scares licensed cabbies. Uber gives you a map of your journey. Licensed cabbies can drive you round in circles and you cannot prove it.

    Now this data is open, you should:
    map all the start and end locations
    calculate the optimal route
    identify the cabs and medallions that deviate most from the optimal
    fine them
    ban them.

    If you want any form of quality control in any system, you must sample a portion of all work and verify it. Even with experienced and proven honest operators, you must still check 10% of their work. This isn't about trust. It's just best practice. Cabbies are finally going under the spotlight and they don't like it.

  37. Re:Prediction: de-anonymization considered "hackin by Anonymous Coward · · Score: 0

    And in your scenario, all of the people making all of those decisions are in fact right. Compassion is a fine thing, but at the end of the day what benefits all of us is economic efficiency. It is hard on the people who are fired, and that's a good reason to give them generous severance packages because in the end that's unlikely to do significant damage to the bottom line and the goodwill generated has significant value, but keeping people on just to be nice is a bad idea.

  38. Re:Prediction: de-anonymization considered "hackin by Anonymous Coward · · Score: 0

    You need garters for those breeches.

  39. Re:Prediction: de-anonymization considered "hackin by bluegutang · · Score: 2

    This is not a new phenomenon. And not an easy one to solve. From The Grapes of Wrath by John Steinbeck:

    "I built [this house] with my hands. Straightened old nails to put the sheathing on. Rafters are wired to the stringers with baling wire. It's mine. I built it. You bump it down—I'll be in the window with a rifle. You even come too close and I'll pot you like a rabbit."

    "It's not me. There's nothing I can do. I'll lose my job if I don't do it. And look—suppose you kill me? They'll just hang you, but long before you're hung there'll be
    another guy on the tractor, and he'll bump the house down. You're not killing the right guy."

    "That's so," the tenant said. "Who gave you orders? I'll go after him. He's the one to kill."

    "You're wrong. He got his orders from the bank. The bank told him, 'Clear those people out or it's your job.'"

    "Well, there's a president of the bank. There's a board of directors. I'll fill up the magazine of the rifle and go into the bank."

    The driver said, "Fellow was telling me the bank gets orders from the East. The orders were, 'Make the land show profit or we'll close you up.'"

    "But where does it stop? Who can we shoot? I don't aim to starve to death before I kill the man that's starving me."

    "I don't know. Maybe there's nobody to shoot. Maybe the thing isn't men at all. Maybe like you said, the property's doing it. Anyway I told you my orders."

  40. Re:Prediction: de-anonymization considered "hackin by cellocgw · · Score: 1

    No, you just need to stop making such shitty cars.

    Seems a lot of people got whooshed by the original post, so:

    I have changed your automobile safety design. Pray I do not change it further -- T. Durden

    --
    https://app.box.com/WitthoftResume Code: https://github.com/cellocgw
  41. Re:Prediction: de-anonymization considered "hackin by Opportunist · · Score: 1

    Is that like your bus pants?

    --
    We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.