Slashdot Mirror


US District Court Says Calculating a Hash Value = Search

bfwebster writes "Orin Kerr over at The Volokh Conspiracy (a great legal blog, BTW) reports on a US District Court ruling issued just last week which finds that doing hash calculations on a hard drive is a form of search and thus subject to 4th Amendment limitations. In this particular case, the US District Court suppressed evidence of child pornography on a hard drive because proper warrants were not obtained before imaging the hard drive and calculating MD5 hash values for the individual files on the drive, some of which ended up matching known MD5 hash values for known child pornography image and video files. More details at Kerr's posting." Update: 10/28 16:23 GMT by T : Headline updated to reflect that this is a Federal District Court located in Pennsylvania, rather than a court of the Commonwealth itself.

13 of 623 comments (clear)

  1. It's good to see. by UseTheSource · · Score: 5, Informative

    The courts are finally getting up to speed on technology.

    --
    "Ein Volk, ein Reich, ein Führer." -Adolf Hitler
    "We are one Nation, we are one People." -The One 'leader'
    1. Re:It's good to see. by UseTheSource · · Score: 5, Informative

      It's not that child pornographers shouldn't be prosecuted, but like it or not, they're still entitled to the same due process as normal, "non-pervert" criminals. This "it's for the children" stuff shouldn't fly when we claim to follow the rule of law.

      --
      "Ein Volk, ein Reich, ein Führer." -Adolf Hitler
      "We are one Nation, we are one People." -The One 'leader'
    2. Re:It's good to see. by lysergic.acid · · Score: 4, Informative

      also, wouldn't this type of search be pretty useless for identifying kiddy porn images?

      md5 hashes are useful for verifying a binary package is in fact what it is supposed to be because it's hard to create a fake or altered program that produces the same md5 hash number as the authentic copy. so it's useful for verifying a "good" file, because presumably a good file won't try to deceive you, and a bad file can't reproduce the same md5 hash.

      however, with something like a digital photo, all a user has to do is make a few very minor alterations (like a small watermark) to the image and it would produce a different md5 hash--essentially exploiting the inherent design of the md5 hash algorithm--and be missed by the md5 scan. these small changes could be as simple as flipping a single bit in the file, but with a standard 24-bit RGB bitmap image, each pixel is stored as three 8 bit values representing the red, green, and blue color channels. by flipping the least significant bit in each channel, you can alter up to 1/8th (12.5%) of the file without creating any perceptible changes (to human eyes at least) to the displayed image.

      another method would be to employ lossy compression schemes like JPEG image compression. convert all your images to JPEG (or if they are already JPG, just compress it again at minimal compression strength) and the MD5 hashes will be completely altered. yet another method is to resize the image by a small amount--say reduce both width and height by just 1 pixel--using bicubic interpolation to scale the image up or down would preserve the image quality while completely changing the md5 signature of the file.

      all of these methods would be simple to automate and allow you to easily hide known child porn images from detection using md5 comparisons.

    3. Re:It's good to see. by Anonymous Coward · · Score: 4, Informative

      I used to work in an australian court. And I remember a judge in tears throwing out a paedophile case where the guy was *clearly* guilty as hell, but the prosecution had bungled it so badly it couldn't' possibly be presented to the jury in that state. Afterwards she practically broke glass screaming at the prosecutor.

      Afterwards I asked her about the case and she told me that although she was bitter , even the worst of scumbags deserve a fair trial, and that fair trial wasnt it.

      Later that year they retried the case properly and the guy got 20 years.

  2. that's basically what they were doing. by yincrash · · Score: 5, Informative

    you can't generate md5s w/o actually looking at all of the data in the file.

    1. Re:that's basically what they were doing. by Anonymous Coward · · Score: 3, Informative

      "We got this guy, but let's get a warrant before we scan his hard drive."

      The odd thing is that the computer was in the landlord's friend's friend's (brother's dogwalker's sister-in-law's... whoops, got carried away) possession having been seized during the eviction. The vast majority of precedent (used whenever the government wants data from phone companies and mail servers, etc) says that if the guy with the data freely gives it to the cops, they don't need no steenkin warrant.

      While the overall decision is welcome (that the government can't just force their way into my house and hash my drive on a whim), the method by which the decision was arrived at is unsound, and will almost certainly be overturned on the grounds that it wasn't the pedophile's drive anymore, therefore the pedophile had no standing to object to the search.

  3. Comment removed by account_deleted · · Score: 5, Informative

    Comment removed based on user account deletion

  4. Error made by Slashdot in headline by bfwebster · · Score: 5, Informative

    When I submitted this story, I gave it the headline "US Court:...". Someone changed that to "PA Court Says...". That's wrong. This is a ruling from a US District (Federal) court, not a Pennsylvania state court, and so carries much more weight. ..bruce..

    --
    Bruce F. Webster (brucefwebster.com)
  5. Comment removed by account_deleted · · Score: 3, Informative

    Comment removed based on user account deletion

  6. Re:That's a terrible argument by msuarezalvarez · · Score: 4, Informative

    What evidence? Some md5 hashes that happen to match hashes from a select number of images? Odds are if we hash out every file on your hard drive we will also find matches to that same list.

    Actually, odds are the hashes will not match...

  7. Re:That's a terrible argument by johnlcallaway · · Score: 4, Informative

    Odds yes.

    But no guarantee.

    A better check is hash and file size, since it is more difficult for two files of the same size to have the same hash by chance. Especially using compression due to images or videos of the same dimensions reducing to different sizes.

    Hash and file size checks are useful for checking if a file is intact and possibly not altered. They are great for lookups.

    But, in the end, you still need the file to validate the correct item is found. Hashmaps store both the key and hash for this very reason. The hash is a quick lookup, but the key is needed to verify the right element has been found.

    Unless the hash is the same size as the key.....

    --
    I rarely read replies, it's my opinion and if you thought about your opinion a little more, I'm OK with that.
  8. Cops blow it again by russotto · · Score: 5, Informative

    Not only did they search the drive without a warrant, but they also got the defendant to confess to putting the files there by questioning him without reading his rights and telling him that he didn't need an attorney. Genius.

    Even dumber: Based on the testimony of the guy who originally found the child porn, they could have gone to a magistrate and gotten a warrant. Then there would have been no issue of a warrantless search.

    BTW, for those considering the abandoned-property angle -- the court goes into that. It wasn't a legal eviction and the defendant hadn't abandoned his stuff; he merely hadn't removed it all yet.

  9. Re:That's a terrible argument by blueg3 · · Score: 4, Informative

    Yes, that's the birthday paradox. I'm not sure offhand how big the NCMEC database is, which is usually what they're comparing against, but let's try some math.

    Let's say your hard drive has N files and the database has M items (so, comparing a list of N to another list of M hashes). Your hard drive doesn't actually contain any of the files used to generate the "bad" hash list. The probability of a hash collision is approximately P = 1 - exp( -N*M / (2 * 2^128) ). Assuming the value in the exponent is small, this is approximately P = N*M/2^129. 2^129 is in the rough vicinity of 10^43. In order for you to have a one in a billion (10^9) chance of a false positive, the product N*M would have to be ~10^34. If the hash list has a billion items (I think it's smaller than that, by quite a lot), you'd need 10^25 files on your disk -- well beyond the capacity of readily-available desktop storage.

    MD5 hashes are useful because they're resilient to even birthday collisions. What they're not resilient to, it turns out, is intentionally creating two files with the same MD5 hash. (Even then, it is infeasible to generate two files with the same MD5 hash and the same size.)