Data Mining Used Hard Drives
linuxwrangler writes "One hopes the /. crowd knows the perils of discarding storage with sensitive data but this article drives home the point. Two MIT grad students bought used drives from eBay and secondhand computer stores. Among the data found on the 158 drives were 5,000 credit-card numbers, porn, love-letters and medical information."
Another reason to securely erase your data. In the end, _you_ are responsible for data under the Data Protection Act (in the UK anyway)
Even broken hard drives can be recovered, though it takes some rather expensive equipment to do so. However, with a little creativity and some equipment you would likely find in a EE department, much of it could be recovered.
PGP (for windows or mac, ie not GPG) has two commands related to this: wipe file and wipe free space. They overwrite the appropriate sectors of the disk with several patterns designed to ensure that no matter what (common) encoding scheme the hard disk uses, every bit will have been set at least once, zeroed at least once, and overwritten with pseudorandom data at least once. If you set in on a lot of passes, it does an even better job. This would be a cheap (free, except for time and bandwidth to download it) way to make sure your sensitive data doesn't get out.
That said, experts would tell you that the only reliable way to make sure sensitive data doesn't get out is to thermite your drive.
Also, what's the one-line unix command (running MacOS X here).
I hereby place the above post in the public domain.
In regards to Wiping data, do yourself a favor and check out http://www.heidi.ie/eraser/
Beyond the wonderfull wiping the program does, there is the option to make an emergency boot floppy that wipes the HD with DOD style 7-pass or a GutherSomething 36 pass! Niffty for the paranoid.
I hereby place the above post in the public domain.
Now days the dod drills a hole through the platter on drives that are bad that have to be RMA'd and have contracts so all they have to return is the top of the drive with the label. as for drives they no longer need i do not know. im guessing they write 0 and 1 patterns on the drive 7+ times. (even then data recovery services could recover it)
- Get out your favorite Linux installer CD or download a copy of Tom's RTBT and write it to floppy or CD-R.
- Boot from the floppy or CD.
- Log in as root.
- Run dd if=/dev/zero of=/dev/hda to erase the master drive on the primary IDE controller (/dev/hdb etc. for the remaining disks)
That's all. It erases all the blocks normally accessible by the disk controller and is probably safe enough for most people. Bad blocks that have been replaced may still contain a little bit of data, and inter-track data may be recoverable by analog means.It's not enough to write 0's to remove traces of a file. Writing random patterns is much better and for older drives you can even do better than random (i.e. more erasing in less passes). The shred(1) command from the GNU fileutils will take care of this for you in Unix-alikes.
e s/ shred/1
_ del.html for an informative paper about the details of how secure deletion works.
http://btr0xw.rz.uni-bayreuth.de/cgi-bin/manpag
See also http://www.cs.auckland.ac.nz/~pgut001/pubs/secure
Data Mining is NOT the process of recovering or otherwise retrieving data. Data Mining is the process of discovering knowledge through data that has already been obtained (usually through statistical and/or AI techniques). I.e., data retrieval/collection is a prerequisite for Data Mining.
Communism was just a red herring.
A goverment contractor donated some old PowerBook 140/180s to our school and one of them had an unformatted HD. Imagine my suprise when I booted it up and there were documents on there that said something along the lines of "This document has been classified Top Secret by the Department of Defense" at the top of them. I don't know what is more pathetic, the fact that this laptop was allowed to get out with confidential data on it or that it was unencrypted to begin with.
Also that same year, the school councilor retired his trusty quadra 610(?) and he had all the psychological, academic, and disciplinary records on there from 1993 and up on there. No password. No encryption. No attempts to even get rid of data.
A few months back, my brother picked up an old computer for $8 at a garage sale. He wanted me to fix it up for him and get it to do something. I was in for a nasty suprise when I found about 200 MB of gay pr0n jpegs on there.
When I was taking my A+ class at my HS, we were given some old computers from the county office of education to get in working order to give to people who couldn't afford computers. There was a small text file on it that contained passwords for most of the servers in the COE.
You can get quite a bit without even recovering files. People are idiots.
Don't forget degaussing. Someone is going to have to make the obligatory link to Secure Deletion of Data from Magnetic and Solid-State Memory, so there it is.
Can anyone tell my why there has to be numerous random-bit passes when one could do something like this:
dd if=/dev/zero of=/dev/hda bs=512
What's wrong with just zeroing out the drive once?
Say the child porn file has a one bit and a zero bit. You overwrite it with two zero bits. The magnetic domains where the one bit was are presumably weaker or smaller because they were flipped, not reinforced like the zero bit domains. Of course the drive's read head itself won't be useful for extracting this information, because it's only designed to determine the last bit written by the write head- a binary zero/one determination. But with special equipment you can measure domain strengths carefully, and pull more information than a single bit out of them. You can tell which domains were flipped by the zero-out process and which were reinforced. (Of course this is a simplification because each bit is composed of multiple domains.)
So there are a few trivially obvious considerations when writing an erasing program-
-Don't write zeroes, write ones and zeroes.
-Go in more than one pass. A single pass leaves the bits in 4 possible states- (0,0), (0,1), (1,0), and (1,1) (where (c,r) are the child-porn and random-overwrite bits, respectively). An attacker can in theory tell all four states apart by close physical examination, so he knows c. Two passes (c,r1,r2) leaves 8 possible states- (0,0,0), (0,0,1), (0,1,0), (0,1,1), (1,0,0), (1,0,1), (1,1,0), and (1,1,1). Now the attacker's equipment needs more than twice as much precision, because some of them, like (0,0,1) and (1,0,1), are starting to look physically similar. 10 passes leaves 1024 possible domain states, many of which are indistinguishable.
-Writing zeroes over the file ten times is much better than writing zeroes over it once, but still leaves it in one of only four possible states. (Which are admittedly harder to tell apart, but you never know.)
-Do not allow the content of the file you're erasing to influence your decision of what bits to overwrite it with. You avoid a whole class of problems this way.
-Be aware that when you are writing random numbers, you are actually encrypting, not erasing, the file. The seed you used for your random number generator becomes a key for decrypting the file (given special equipment).
-You want to prevent the attacker from knowing what bits you wrote and in what order you wrote them. You will favor erasure over encryption if you can continually introduce entropy into the process. But entropy is scarce in most software environments. The variations in the timings of the drive's mechanical movements, ping responses from remote servers, mouse movements, and keypresses are well-known sources.
-Don't use a lousy random number generator. There are many ways for a random number generator to be bad. The simplest type produces numbers where n-tuples fall on a regular lattice when plotted in n dimensions. Generators like that are used a lot in scientific and graphics applications, but have no business being in security applications. If an attacker gains access to a few of the numbers in the generator's sequence, he can predict the rest of the sequence. They also loop after generating 2^N numbers.
-If applying this process to a single file, hide the size of the file.
-Ideally you should hide all traces of the file's existence. This means clean up after yourself by writing zeroes in the last several passes, so that even the domain randomness is physically removed (its presence implies that something was erased).