Data Mining Used Hard Drives
linuxwrangler writes "One hopes the /. crowd knows the perils of discarding storage with sensitive data but this article drives home the point. Two MIT grad students bought used drives from eBay and secondhand computer stores. Among the data found on the 158 drives were 5,000 credit-card numbers, porn, love-letters and medical information."
Another reason to securely erase your data. In the end, _you_ are responsible for data under the Data Protection Act (in the UK anyway)
Discarded computer hard drives prove a trove of personal info
JUSTIN POPE, AP Business Writer Wednesday, January 15, 2003
(01-15) 13:17 PST CAMBRIDGE, Mass. (AP) --
So, you think you cleaned all your personal files from that old computer you got rid of?
Two MIT graduate students suggest you think again.
Over two years, Simson Garfinkel and Abhi Shelat bought 158 used hard drives at secondhand computer stores and on eBay. Of the 129 drives that functioned, 69 still had recoverable files on them and 49 contained "significant personal information" -- medical correspondence, love letters, pornography and 5,000 credit card numbers. One even had a year's worth of transactions with account numbers from a cash machine in Illinois.
About 150,000 hard drives were "retired" last year, according to the research firm Gartner Dataquest. Many end up in the trash, but many also find their way back onto the market.
Over the years, stories have surfaced about personal information turning up on used hard drives, raising concerns about privacy and the danger of identity theft.
Last spring, Pennsylvania sold used computers that contained information about state employees. In 1997, a Nevada woman bought a used computer and discovered it contained prescription records on 2,000 customers of an Arizona pharmacy.
Garfinkel and Shelat, who reported their findings in an article to be published Friday in the journal IEEE Security & Privacy, said they believe they are the first to take a more comprehensive -- though not exactly scientific -- look at the problem.
On common operating systems such as Microsoft's Windows, simply deleting a file, or even following that up by emptying the "trash" folder, does not necessarily make the information irretrievable. Those commands generally delete a file's name from the directory. But the information itself can live on until it is overwritten by new files.
Even reformatting a drive, or preparing the hard drive all over again to store files, may not do it. Fifty-one of the 129 working drives in the MIT study had been reformatted, and 19 of them still contained recoverable data.
The hard-to-erase quality of hard drives is seen as a good thing by some. Many users like believing that, in a pinch, an expert could recover their deleted files. Law enforcement officers can examine a computer and lift incriminating e-mails or porno images from the hard drive.
The only sure way to erase a hard drive is to "squeeze" it: writing over the old information with new data -- all zeros, for instance -- at least once, but preferably several times. A one-line command will do that for Unix users, and for others, inexpensive software from companies such as AccessData works well.
But few people go to the trouble. Many ordinary computer users toss their old drives into the closet, or take a sledgehammer to it.
As it turned out, most of the hard drives acquired by the MIT students came from businesses that apparently had a misplaced confidence in their ability to "sanitize" old drives.
Tom Aleman, who heads the analytic and forensic technology group at the accounting firm Deloitte & Touche, often encounters companies that get burned by failing to fully sanitize, say, the laptop of an employee who leaves the company for a job with a competitor.
"People will think they have deleted the file, they can't find the file themselves and that the file is gone when, in fact, forensically you may be able to retrieve it," he said.
Garfinkel has learned his lesson. As an undergrad at MIT in the 1980s, he failed to sanitize his own hard drive before returning a computer to his father. His father was able to read his personal journal.
Even broken hard drives can be recovered, though it takes some rather expensive equipment to do so. However, with a little creativity and some equipment you would likely find in a EE department, much of it could be recovered.
PGP (for windows or mac, ie not GPG) has two commands related to this: wipe file and wipe free space. They overwrite the appropriate sectors of the disk with several patterns designed to ensure that no matter what (common) encoding scheme the hard disk uses, every bit will have been set at least once, zeroed at least once, and overwritten with pseudorandom data at least once. If you set in on a lot of passes, it does an even better job. This would be a cheap (free, except for time and bandwidth to download it) way to make sure your sensitive data doesn't get out.
That said, experts would tell you that the only reliable way to make sure sensitive data doesn't get out is to thermite your drive.
Also, what's the one-line unix command (running MacOS X here).
I hereby place the above post in the public domain.
Take them outside, and throw them as high into the air as possible. Then watch them land on concrete.
I think that render the drive useless. =)
Probably not. Most commercial harddrives are rated for at least 50gs of acceleration. My Deskstar is good for up to 100. You might dent the outer case, but it'll probably still work.
___ alwaysBETA.com - Hey, you've got nothing better to do.
In regards to Wiping data, do yourself a favor and check out http://www.heidi.ie/eraser/
Beyond the wonderfull wiping the program does, there is the option to make an emergency boot floppy that wipes the HD with DOD style 7-pass or a GutherSomething 36 pass! Niffty for the paranoid.
I hereby place the above post in the public domain.
Try "Undelete 3.0" for Windows XP/NT/2000. It's freeware (and in English) if you're a home user.. :]
Oops, forgot to put a link. http://www.oosoft.de/english/products/ooue/index.h tml
Now days the dod drills a hole through the platter on drives that are bad that have to be RMA'd and have contracts so all they have to return is the top of the drive with the label. as for drives they no longer need i do not know. im guessing they write 0 and 1 patterns on the drive 7+ times. (even then data recovery services could recover it)
When I was in the army, we decommissioned a whole bunch of those old hard-drives with 8" platters. We took them apart, removed each platter and and used a belt sander to destroy the surfaces. The sanded platters were then sent to a facility on base that would melt them down.
The bodies of the drives were mostly magnesium, and I came away with about $250 from the scrap metal dealer.
Of course, who knows what I breathed by sanding those platters...
Sees IC anStillTell.html
http://www.videopremiereawards.com/HTMLNews/New
- Get out your favorite Linux installer CD or download a copy of Tom's RTBT and write it to floppy or CD-R.
- Boot from the floppy or CD.
- Log in as root.
- Run dd if=/dev/zero of=/dev/hda to erase the master drive on the primary IDE controller (/dev/hdb etc. for the remaining disks)
That's all. It erases all the blocks normally accessible by the disk controller and is probably safe enough for most people. Bad blocks that have been replaced may still contain a little bit of data, and inter-track data may be recoverable by analog means.*sigh*
From the terms of use page on this site:
"Please note, the content of this interactive movie, including characters and any and all elements, hereof, is entirely fictional, and is not based upon any actual individual or of any other legal entity"
grib.
maybe
It's not enough to write 0's to remove traces of a file. Writing random patterns is much better and for older drives you can even do better than random (i.e. more erasing in less passes). The shred(1) command from the GNU fileutils will take care of this for you in Unix-alikes.
e s/ shred/1
_ del.html for an informative paper about the details of how secure deletion works.
http://btr0xw.rz.uni-bayreuth.de/cgi-bin/manpag
See also http://www.cs.auckland.ac.nz/~pgut001/pubs/secure
At today's densities, all drives have many many bad sectors that are mapped out in a sector translation ROM on the drive's logic board and no two are the same. Swap boards and it's almost always lights out. I guess you could swap the ROM if you can identify it and have the right surface mount rework tools.
Data Mining is NOT the process of recovering or otherwise retrieving data. Data Mining is the process of discovering knowledge through data that has already been obtained (usually through statistical and/or AI techniques). I.e., data retrieval/collection is a prerequisite for Data Mining.
Communism was just a red herring.
I'm going to be sending a company HD to Dell to RMA since it's starting to fail (stupid IBM DeskStar 60GB drives)... From what I've heard (and contrary to a few other posts in this story), it is still possible to retrieve some data from a hard drive where you've done "dd if=/dev/zero of=/dev/hda" (I still don't get how, but I err on the side of caution).
:)
Enter GNU shred. Its default operation does 25 passes at the drive, with passes such as random data, random patterns and all zeros. Theoretically, the drive has been overwritten so many times that there is almost no chance of recovering data.
Of course, just to play it safe I'll also run it across my stereo speakers a few times too
A goverment contractor donated some old PowerBook 140/180s to our school and one of them had an unformatted HD. Imagine my suprise when I booted it up and there were documents on there that said something along the lines of "This document has been classified Top Secret by the Department of Defense" at the top of them. I don't know what is more pathetic, the fact that this laptop was allowed to get out with confidential data on it or that it was unencrypted to begin with.
Also that same year, the school councilor retired his trusty quadra 610(?) and he had all the psychological, academic, and disciplinary records on there from 1993 and up on there. No password. No encryption. No attempts to even get rid of data.
A few months back, my brother picked up an old computer for $8 at a garage sale. He wanted me to fix it up for him and get it to do something. I was in for a nasty suprise when I found about 200 MB of gay pr0n jpegs on there.
When I was taking my A+ class at my HS, we were given some old computers from the county office of education to get in working order to give to people who couldn't afford computers. There was a small text file on it that contained passwords for most of the servers in the COE.
You can get quite a bit without even recovering files. People are idiots.
For stuff like medical data, financial data, etc., I'd seriously consider looking into wipe instead, which uses Peter Gutman's patterns.
Back in the good old days, low level format actually did something. It rewrote the tracks and sectors on the platters. Nowadays, with high data density and whatnot, it's much more difficult to write the tracks and sectors, and special machinery is used to do so. The standard head isn't able to get enough accuracy.
i get rid of numbers of hd's every month and prying open the case, putting a paper towel between your finger and the platter and just lightly pressing on them to smash them is all it takes.
the platters are fairly rigid so when you smash them they disintegrate into tiny tinty pieces usually never possible to recover (most of the platter ends up in 1/32nd bits or smaller, thats why the paper towel is there, to prevent micro splinters getting wedged in your skin ).
otherwise, just wedge a screwdriver between the casing and platter, and smash platter by leverage.
no one can read data off of dust.
Think I'd use killdisk before I leave the company I work for (not that I do anything wrong, but just to make sure they don't dig anything up). It allows for up to 99 passes.
Don't forget degaussing. Someone is going to have to make the obligatory link to Secure Deletion of Data from Magnetic and Solid-State Memory, so there it is.
LOL! I had the same thing, from an old server at a medical center, giant 2GB SCSI-II drives full of insurance info, dental records, and who knows what else. I tossed the drives after a while because I didn't want the bad karma, but all I had to do was ask for them, they were willfully handed over to me by a doctor when I was 17.
"Sometimes, I think Trent just needs a cup of hot chocolate and a blankie." -Tori Amos on Nine Inch Nails
Think I'd use killdisk before I leave the company I work for
Or you could use Eraser.
It's free, as a bonus, and it's floppy-based killer uses Gutmann's algorithim to do it's bit.
-- R
For a 25 foot fall with (nearly) no drag, the drive will get up to a speed of 40.0 ft/sec (27.3 MPH). If the drive stops over a 1/8" distance, with -uniform deceleration- (this is pretty generous for a fall onto concrete), this equates to 1600 G's. Halve the distance, and quadruple the force. Decelerate it in a non-uniform fashion (as it realistically would) and you'll get even more spectacular results.
See this review of a hitachi drive. Note that they say a drive designed for a non-operating shock of 800G's can take a fall of -one foot- onto concrete. I destroyed a maxtor by dropping it 3 feet onto carpet in a past life, and I'd suspect it was rated for a non-operating shock of at least 50G's.
I'd love to see you try it with your drive with your valuable data sometime though.
Can anyone tell my why there has to be numerous random-bit passes when one could do something like this:
dd if=/dev/zero of=/dev/hda bs=512
What's wrong with just zeroing out the drive once?
Say the child porn file has a one bit and a zero bit. You overwrite it with two zero bits. The magnetic domains where the one bit was are presumably weaker or smaller because they were flipped, not reinforced like the zero bit domains. Of course the drive's read head itself won't be useful for extracting this information, because it's only designed to determine the last bit written by the write head- a binary zero/one determination. But with special equipment you can measure domain strengths carefully, and pull more information than a single bit out of them. You can tell which domains were flipped by the zero-out process and which were reinforced. (Of course this is a simplification because each bit is composed of multiple domains.)
So there are a few trivially obvious considerations when writing an erasing program-
-Don't write zeroes, write ones and zeroes.
-Go in more than one pass. A single pass leaves the bits in 4 possible states- (0,0), (0,1), (1,0), and (1,1) (where (c,r) are the child-porn and random-overwrite bits, respectively). An attacker can in theory tell all four states apart by close physical examination, so he knows c. Two passes (c,r1,r2) leaves 8 possible states- (0,0,0), (0,0,1), (0,1,0), (0,1,1), (1,0,0), (1,0,1), (1,1,0), and (1,1,1). Now the attacker's equipment needs more than twice as much precision, because some of them, like (0,0,1) and (1,0,1), are starting to look physically similar. 10 passes leaves 1024 possible domain states, many of which are indistinguishable.
-Writing zeroes over the file ten times is much better than writing zeroes over it once, but still leaves it in one of only four possible states. (Which are admittedly harder to tell apart, but you never know.)
-Do not allow the content of the file you're erasing to influence your decision of what bits to overwrite it with. You avoid a whole class of problems this way.
-Be aware that when you are writing random numbers, you are actually encrypting, not erasing, the file. The seed you used for your random number generator becomes a key for decrypting the file (given special equipment).
-You want to prevent the attacker from knowing what bits you wrote and in what order you wrote them. You will favor erasure over encryption if you can continually introduce entropy into the process. But entropy is scarce in most software environments. The variations in the timings of the drive's mechanical movements, ping responses from remote servers, mouse movements, and keypresses are well-known sources.
-Don't use a lousy random number generator. There are many ways for a random number generator to be bad. The simplest type produces numbers where n-tuples fall on a regular lattice when plotted in n dimensions. Generators like that are used a lot in scientific and graphics applications, but have no business being in security applications. If an attacker gains access to a few of the numbers in the generator's sequence, he can predict the rest of the sequence. They also loop after generating 2^N numbers.
-If applying this process to a single file, hide the size of the file.
-Ideally you should hide all traces of the file's existence. This means clean up after yourself by writing zeroes in the last several passes, so that even the domain randomness is physically removed (its presence implies that something was erased).
This is a big problem for DoD-type datacenters; for non-classified (as in "this stuff shouldn't get out") stuff, they open the disk up, sand-blast the platters to remove the magnetic material, then return the carcass to the manufacturer for a warranty claim. For the really secret stuff (as in "people will die if this stuff gets out"), they just destroy the disk completely, then buy a new drive.
Of course, if you kept all the data on the disk encrypted, you'd be fairly safe, but once you're making a warranty claim, the disk probably isn't working well enough for you to wipe using 'dd'...
Speaking of 'dd': Beware of sector remapping. Any sectors on the disk which the firmware has marked 'bad' won't be touched by any user-level command - and those 'bad' sectors could still be recovered if they open the disk up. For most people, 'leaking' a couple of sectors wouldn't be the end of the world, but for (say) VISA's customer records, there are probably a couple of valid CC numbers and other info in those sectors...
If you wipe, remember to take your device's physics into account.
Wipe it once when it is completely "cold" (computer has been turned off for at least several hours), then wipe it again after it has been running for an hour or so, and wipe it a third time after you've giving the disk some serious thrashing (that is, disk activity that moves the head around quite a bit).
The reason is temperature. Data is saved on circles on a magnetic medium. The read/write head has a certain amount of thickness, and so have the tracks on the platter (the tracks have to be a bit widther than the head is, to take thermal expansion into account so the head won't overwrite data on neighbour tracks).
So, for some specialized data recovery company, it may even be possible to recover different data from the same track, because after a while of use, a track can look like this:
---------------- Outer track end
AAAAAAAAAAAAAAAA Older data 1
BBBBBBBBBBBBBBBB
BBBBBBBBBBBBBBBB Actual data
BBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCC Older data 2
---------------- Inner track end
So, your drive will always read the data in 'B'. In 'C' there might still be data your computer saved when the drive had just spun up and was cold, while 'A' might still hold a copy of data that was written on very heavy disk activity when the drive was really hot.
To overwrite all of this data, you need to have the drive write in any of the temperature states that it has been in within this life.
"Simple" writing might only destroy all 'B' data and leave all 'A' and 'C' data intact on the drive, where they can be recovered.
42. Easy. What is 32 + 8 + 2?
what you need to do is overwrite the whole harddisk several times with different patterns. Peter Gutmann recomends 35 passes with different patterns. The DoD 5220.22-M NISPOM recomends 3 passes.
Secure Harddisk Eraser implements these 35 or 3 passes on a single floppy. Just boot from the floppy, wait 60 seconds and the harddisk will start to erase.
The homepage
Any sufficiently advanced libertarian utopia is indistinguishable from government.
Check out Autoclave
Its a mini-linux distribution that boots off a floppy, then allows you to pick which hard drive you want to wipe clean.