US District Court Says Calculating a Hash Value = Search
bfwebster writes "Orin Kerr over at The Volokh Conspiracy (a great legal blog, BTW) reports on a US District Court ruling issued just last week which finds that doing hash calculations on a hard drive is a form of search and thus subject to 4th Amendment limitations. In this particular case, the US District Court suppressed evidence of child pornography on a hard drive because proper warrants were not obtained before imaging the hard drive and calculating MD5 hash values for the individual files on the drive, some of which ended up matching known MD5 hash values for known child pornography image and video files. More details at Kerr's posting." Update: 10/28 16:23 GMT by T : Headline updated to reflect that this is a Federal District Court located in Pennsylvania, rather than a court of the Commonwealth itself.
The courts are finally getting up to speed on technology.
"Ein Volk, ein Reich, ein Führer." -Adolf Hitler
"We are one Nation, we are one People." -The One 'leader'
you can't generate md5s w/o actually looking at all of the data in the file.
Comment removed based on user account deletion
When I submitted this story, I gave it the headline "US Court:...". Someone changed that to "PA Court Says...". That's wrong. This is a ruling from a US District (Federal) court, not a Pennsylvania state court, and so carries much more weight. ..bruce..
Bruce F. Webster (brucefwebster.com)
Comment removed based on user account deletion
Even if the hard drive has a couple of million files on it and there are a few thousand known hashes of illegal files, the odds of having a different file with a matching hash are in the neighborhood of 10^28 to 1 against.
What evidence? Some md5 hashes that happen to match hashes from a select number of images? Odds are if we hash out every file on your hard drive we will also find matches to that same list.
Actually, odds are the hashes will not match...
Better a few guilty men go free on a technicality than allow officers to become a law unto themselves.
The largest US gang has a well documented record that would seem to indicate your statement is out of date.
As another everyday example, here's a big surprise, no?
I'm not intending to troll/flamebait here, but MY perception is there is very little accountability for the 'on the job' crew in blue amongst themselves. It is also my perspective that there is very little integrity once one subscribes to the original meaning of the thin blue line.
Odds yes.
But no guarantee.
A better check is hash and file size, since it is more difficult for two files of the same size to have the same hash by chance. Especially using compression due to images or videos of the same dimensions reducing to different sizes.
Hash and file size checks are useful for checking if a file is intact and possibly not altered. They are great for lookups.
But, in the end, you still need the file to validate the correct item is found. Hashmaps store both the key and hash for this very reason. The hash is a quick lookup, but the key is needed to verify the right element has been found.
Unless the hash is the same size as the key.....
I rarely read replies, it's my opinion and if you thought about your opinion a little more, I'm OK with that.
Not only did they search the drive without a warrant, but they also got the defendant to confess to putting the files there by questioning him without reading his rights and telling him that he didn't need an attorney. Genius.
Even dumber: Based on the testimony of the guy who originally found the child porn, they could have gone to a magistrate and gotten a warrant. Then there would have been no issue of a warrantless search.
BTW, for those considering the abandoned-property angle -- the court goes into that. It wasn't a legal eviction and the defendant hadn't abandoned his stuff; he merely hadn't removed it all yet.
As far as I've understood:
1) Computer Owner was evicted. Left computer behind - maybe to collect later - who knows what the verbal agreement was here, I imagine the PC owner has claimed that we was going to pick it up and that it wasn't abandoned or trash.
2) Home Owner hires someone to clear out the stuff left behind.
3) Hired Person finds PC, and takes custody of PC.
4) Hired Person passes on PC to friend, who therefore has custody.
5) Friend discovers porn, calls police.
So the evidence could be tainted by the hired person and the friend. In addition the owner could have had access. The owner has a grudge against the PC owner as they never paid, it isn't inconceivable that they could have arranged for something to be added to get the person in trouble as punishment.
A proper forensic examination might have resolved what happened. This is unlikely now.
Chain of custody. Very important in forensics.
The landlord and his friend might have had a motive to lie about the guy that was behind on his rent payments. From the blurb from the article, it doesn't seem that his landlord had completed the eviction procedure yet, and was anxious to get Crist out of his house and a new tenant in. The eviction process is not immediate. So he gives Crist's computer to his friend, his friend backdates the clock, and his friend puts kiddie porn on there and turns it over to the cops.
The fact is that the police cannot be certain of the chain of custody in this case without a warrant. With a warrant, they take affidavits in support of chain of custody before they go poking around. It's clear and documented using established procedure. The landlord and his friend can still lie, but they're now subject to the penalties for filing a false statement. Without that supporting documentation and especially because of the nature of the case and the possible motives of the landlord and his friends, it makes the chain of custody issue important.
Comment removed based on user account deletion
According to the article, the computer was removed from the defendant's residence by his landlord's friend because the landlord was in the process of evicting the defendant for non-payment of rent. This computer was not found abandoned on the side of the road with the trash. There's no clear indicator that the defendant gave the computer to the landlord's friend, which means the computer is the defendant's property. Therefore, the landlord's friend does not have the right to consent to a search of the computer. This means that the police need a warrant to search that computer, and given the evidence that the landlord's friend had, they would have likely gotten a warrant without any issue.
It's a procedural screwup on the part of the police. It happens. They're human.
The fact that there are collisions is a fun anomoly as long as you can't generate collisions with an algorithm, not anything useful.
Yeah, sure is a good thing that it's not possible to do that with MD5 hashes.
The problem I have here is I would think that this would come under reasonable cause.
Someone calling the police and saying "Hey I found kiddie porn on this computer." seems to be reasonable cause to me.
It seems that way to me as well, and had they tried to get a warrant based on probable cause, it probably would have succeeded.
Conducting the search without a warrant, however, isn't going to fly unless their are also "exigent circumstances". Which in this case would mean the police have reason to believe that any potential evidence on the laptop would vanish before they could acquire a warrant. Since the laptop was in the possession of the 3rd party who called the police to report the crime, that seems unlikely.
So not getting the warrant was a big mistake, and it's likely a criminal will walk as a result. Even though it's sad, this has to happen. Failing to get a conviction and having the perp walk free is the only thing that motivates police to follow all the correct procedures and guarantee all the suspect's rights. Now the police know that a warrant is not optional when searching a laptop. So in the future the cops won't make this mistake, perps will be caught using proper rules of evidence, and our rights will be more secure.
The enemies of Democracy are
Yes, that's the birthday paradox. I'm not sure offhand how big the NCMEC database is, which is usually what they're comparing against, but let's try some math.
Let's say your hard drive has N files and the database has M items (so, comparing a list of N to another list of M hashes). Your hard drive doesn't actually contain any of the files used to generate the "bad" hash list. The probability of a hash collision is approximately P = 1 - exp( -N*M / (2 * 2^128) ). Assuming the value in the exponent is small, this is approximately P = N*M/2^129. 2^129 is in the rough vicinity of 10^43. In order for you to have a one in a billion (10^9) chance of a false positive, the product N*M would have to be ~10^34. If the hash list has a billion items (I think it's smaller than that, by quite a lot), you'd need 10^25 files on your disk -- well beyond the capacity of readily-available desktop storage.
MD5 hashes are useful because they're resilient to even birthday collisions. What they're not resilient to, it turns out, is intentionally creating two files with the same MD5 hash. (Even then, it is infeasible to generate two files with the same MD5 hash and the same size.)
False. MD5 has the property that if you can find two bytestreams that collide, appending identical data to the end will continue to produce two different files that collide. Furthermore, the collision-finders are able to take an arbitrary prefix, and then append random data to that prefix until a collision is found.
What does this mean? It means you can take a file with a blob of random data in the middle, then generate two files with identical hashes but different random blobs of data in the middle.
This, in turn, allows you to do things like create applications, postscript files, HTML files, and other things which hash identically but act or display completely differently. (You embed both behaviors in the file, then switch depending on the contents of the random data. A close examination will turn up the "bad" side, lying inactive, but simply opening the file will make it appear that all is well.)
It's certainly not as good as being able to match an arbitrary hash, but MD5 collisions are entirely practical to take advantage of today.
At this point, MD5 should be considered to be a checksum, not a validator. MD5 is still very good at detecting random noise injected into a data stream but it should no longer be considered to have any real utility for detecting malicious changes.
If you mod me Overrated, you are admitting that you have no penis.