A Linux-Based "Breath Test" For Porn On PCs
Gwaihir the Windlord writes "A university in Western Australia has started beta testing a tool that's described as 'a random breath test' to scan computers for illicit images. According to this article it's a clean bootable Linux environment. Since it doesn't write to the hard drive, the evidence is acceptable in court, at least in Australia. They're also working on versions to search for financial documents in fraud squad cases, or to search for terrorist keywords. Other than skimming off the dumb ones, does anyone really expect this to make a difference?" The article offers no details on what means the software uses to identify suspicious files.
The article offers no details on what means the software uses to identify suspicious files.
I highly suspect that the police don't want people to know the details of how sophisticated their technology is because they don't want to embarrass themselves. Keeping an aura of mystery and FUD around themselves and their techniques is also a form of psych-ops; it's the chrome facade of a lemon.
And trivial ways to get around it. An encrypted file system is the obvious solution, but hell if they're just checking hashes you could use ImageMagick and a very small shell script to very slightly alter the image, giving you an entirely new hash.
Give me Classic Slashdot or give me death!
Sounds dubious to me. In most jurisdictions I'm aware of, you are not allowed to connect hard drive to a machine physically capable of writing to it if you want anything retrieved from it to be admissible in court, and you need a chain of custody showing this. Software write protection is not good enough, you need to physically disconnect the write pins from the cable (no idea how they do this from SATA - probably something which intercepts write commands and blocks them and goes through an expensive approval process to ensure that it works).
I am TheRaven on Soylent News
'Human skin tones' is a pretty wide range though. Even just restricting it to 'white' people gives you a big range of colours if you consider the various shades of tan / sunburn - anything from deep red to pale white through dull brown. If you want to find naked black- or yellow-skinned people then it's an even bigger range. If something is blue or green you could probably guess it's not naked skin (unless the person is bruised, or wearing body paint), but without factoring in shape as well it's pretty difficult to tell if something is human coloured or not.
Actually, human skin is pretty much all the same hue, it just has different saturation levels. If you convert each image to HSV from RGB, you can just look at the hue component and people all pretty much look the same. This is common in computer vision techniques for identifying skin.
-Taylor
Worldwide Military budgets: $2100 billion. Worldwide Space Exploration budgets: $38 billion. Really, world? Really?
Once upon a time, a company did this, and sold their product to another corporation so that they could monitor employees' email. If I recall correctly, it ended in tears when somebody got sent baby pictures.
It's not the folks descended from criminals that worry me. It's the folks who are descended from the prison wardens who cause all the trouble.
Last time I checked, porn was not illegal.
While the summary says "porn", the article is referring to child pornography - which is illegal.
One of the environments I worked in had a sniffer that grabbed all the images (and associated session information) it could see on the wire for that organization (or at least a subset - there was a LOT of traffic involved). It would then process those images and generate a "skin folder" of suspect imagery. We could then sift through that skin folder looking for illicit browsing, etc.
Yeah - it caught porn. But it also contained a lot of imagery of furniture, mars landscapes, deserts (it really liked the time pictures of camel spiders in Sandland were the hot topic of emails) and other such not-skin-oriented imagery.
As you demonstrate, the MD5 technique does not work. However there are other image "hashing" techniques that do work. For example, take the first three statistical moments of the histogram of the R, G and B intensities. To compare two images take a simple L1 distance between those moments. If it's below some threshold they are the same.
Disclaimer: The above algorithm works best for detecting differences between two video streams even when those video streams are distorted by color shifts. (I have personal experience with using it on production software.) For detecting similarities of images you may have to use slightly different techniques.