Ask Slashdot: Best Way To Store Data In Hard Copy?
First time accepted submitter bmearns writes "I have some simple plain-text files (e.g., account information) that I want to print on paper and store in my firebox as a backup to my backup. What's the best way to encode the data for print so that it can later be restored to digital form? I've considered just printing it as text and using OCR to recover it. The upsides are that it's easy and I can even access the information without a computer if necessary. Downsides are data density, no encryption, no error correction, and how well does OCR work, anyway? Another option is printing 2D barcodes. Upsides are density, error correction, I could encrypt the data before printing. Downsides are that I'll need to split it up into multiple barcodes due to maximum capacity of popular barcode formats, and I can't access the data without a computer. Did I miss any options? What do slashdotters suggest?"
Print a human-readable copy and add a computer-readable format, like barcodes or a pen drive, a hard drive, SD card... (CDs might not survive very long if you're unlucky)
there must be some way to do QR codes
http://qrcode.kaywa.com/ can do it 160 characters at a time, but that seems really inconvenient
The Egyptians used hand written papyrus and we still have copies to look at. The laser printed paper copies of the Book of the Dead simply didn't survive.
Google for OCR-A and OCR-B as TTF. There are freely available versions. I use them for mailing labels, along with PostNet bar codes to make it as easy as possible for the Post Office.
Learning HOW to think is more important than learning WHAT to think.
QR codes. You can encrypt these. If you print them e.g. on plastic foil, they'll last close to forever. Of course, you will need to keep a piece of hardware that can read QR codes.
I would, however, take another route, although outside of the scope of your question. It is something I already do for files that are very valuable to me: I put them on magneto-optical disks. The things last forever and withstand the roughest of treatments. Writing and reading are slow, but that is a downside I just accept. I still have a database ( invaluable to me ) I acquired in the middle '90s on magneto-optical disk. It survived: a fire; spilling of liquids, including dog pee; some mild X-ray radiation; an inadvertent stay in our home's trash can; being jumped upon by a kid; and a 20-foot fall.
Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
And as long as a decent font for OCR is used - like OCR-B then it should be feasible.
The reason for doing it - well, if you want to preserve something for a few decades then it's printing on lint paper and using ink that can survive a long time. The latter is probably the hardest since nobody really knows which kind of ink used in computer printers that's able to survive for centuries.
My suspicion is that the dot matrix printers are better off than lasers and inkjets.
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
In terms of their ubiquity in modern marketing, QR Codes are a slightly annoying solution in search of a problem; but as an engineering approach to the sort of problem the OP described, they're fantastic. There are many free and open source QR Code generation utilities and libraries, and the QR Code spec itself was patented, but freely licensed for public use by the Toyota subsidiary that invented it.
QR codes include error correction, and can encode binary data on the order of a hundred times the density of a regular bar code.
For printing, pick a font that has no ambiguous characters. This makes OCR easier if you have to retrieve the data back into a computer. I suggest Trebuchet, in which I (upper-case eye), l (lower-case ell), and 1 (one) are distinct. Alternatively, use either the OCR-A or OCR-B font, which are not easily read by humans. Place the hard copy in a sealed envelope and store it in a bank safe-deposit box.
Also in the same safe-deposit box, store electronic copies using at least two different media (two so that, if one becomes obsolete and unreadable, the other might still be used). You might want to change the media -- or at least review them -- annually to ensure they are still useable.
Why, the answer is simple, there is no standard for Digital backup. zero zip. There are only two methods of time test backup.
1) Text printed on no acid paper.
2) Microphish. or film.
I suggest you print it with ocr readable characters with a pigment based ink. If you are that serious about backup, take it to a printer and have them printed with good ink on the best paper you can find. store the copies in two separate locations.
Remember every one, there is NO standard on digital backup medium.
Text printed correctly on zero acid paper or film is the only time test way.
IMHO
Might be hard to find, but a nice plastic form of punch tape might do the job of both having a hard copy (technically human readable) and being machine-readable. You'd have the added advantage of being able to encorporate encryption if you so desired.
If you're really serious about having hard printouts that you want to later get back in should a disaster occur, an idea I would have would be to base64 encode the text and then print it using a fixed width font in order to make OCR easier down the line. The downside of this is that should the scan not be great or the paper become degraded then you may find you'd get weird encoding issues if, say, a lowercase "l" is read as an uppercase "I" I'd also take hashes of the text files and print them in the header/footer as a rudimentary way of verifying the files are the same after scanning them back. Maybe do a few tests before committing to such a method, this is totally off the top of my head BTW!
Take a look at Twibright Optar: http://ronja.twibright.com/optar/ (A review is at: http://lwn.net/Articles/242735/)
There used to be one called Bridge, but I couldn't find it. Anyway, it's popular enough so that you can learn braille if you ever lose the digital reader. Also, if you can code at all, it'd be easier to parse the count of dots than the thickness of lines from scanned-in images; perhaps make up your own "braille" system and store the algorithm in plain text along with a bunch of other algorithms. I think you'll be safe enough from most thieves, just not the government (but they can already get your account information). Really, instead I'd rather recommend a remote server (or cloud) and just use Duplicity (rsync+gpg software).
The G
How many accounts can anyone have that they actually need to have bar codes or some other such nonsense to be able to regain entry to them? Print out you account information, user names, passwords, etc., and put the printout in your fire-resistant safe. If your house burns down, or some other calamity happens, and you need to regain access to all of your accounts, then you'll just re-enter tha passwords for each one. This can't possibly be more complicated than setting up some OCR / Barcode / Rube Goldberg solution.
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
Engraved to stone. Guaranteed for centuries.
Back in the late 90s when it was difficult to export strong crypto out of the USA, the PGP project came up with a program to get around this by using some loopholes in the law that allowed the source code to be exported if it was printed in book form.
So the PGP source code was printed out, made into books, shipped overseas, and scanned and OCR'd. My memory is somewhat fuzzy, but they had a suite of utilities to do this reliably. See http://www.pgpi.org/pgpi/project/scanning for a description and links to the tools.
I would compress it with a password (7-zip, RAR etc.) and then use Google Drive, Dropbox etc. to store it.
Thus it will be future proof for many years and accessible on any computer.
I think using a proprietary standard for this has potential for disaster in the long term. QR codes would be much better. Scenario: the author of "Paperbak" discovers a huge improvement in his algorithm and deprecates the old version. 20 years into the future somebody needs to decode their stuff, and they search for the source code to "Paperbak" and realize that the only version they can find on the future internet is the "new/improved" version that can't read their stuff. So they are just the lucky owners of some paper decorated with a very specific arrangement of dots.
With QR Codes, on the other hand, it is difficult to believe that the knowledge of their format will be lost in our lifetimes. They have their own Wikipedia entry describing their structure, for example.
No need to worry about ink: even the cheapest and nastiest laser printers use toner, and a mixture of thermoplastic and carbon black thermally fused to your paper isn't going anywhere(in fact, if you use lousy enough paper, some lucky future archeology intern may have the... unmixed pleasure... of picking the little plastic character glyphs out of the pile of dust, trying to keep them in their original order!).
His data-restore needs probably don't extend to truly epic lengths in any case, so it shouldn't be a big deal.
http://www.pgpi.org/pgpi/project/scanning/
Are available at camera stores. I suspect we'll be able to read CD formats for quite a while longer.
This is a backup to your backup, so digital means must have failed before you'd consider using it. Text is low density, but it has an advantage that any encrypted barcode or other high tech means do not have -- it can be read by human eyes. When you're huddled in a rough lean-to roasting a feral cat over the campfire amid the wreckage of civilization, you will still be able to read your backup. That might come in handy.
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
I'm thinking tatoo might be a good medium. Depends on your storage needs and the size of your back.
Maybe I missed something: Why encrypt the hard drive if I'm going to tape the password to it?
The whole point of using a hardcopy is to avoid a number of problems with digital copies, the biggest of which is that harddisks, flash memory, and optical discs all suffer in terms of data longevity. They can also be damaged relatively easily, and, as someone mentioned above, data and hardware formats go obsolete and may be practically inaccessible in relatively short order.
Slashdot is not a game, Slashdot is not a game. Crap, I just lost points.
I would think QR codes might do it. You can format it as a page of data, then a QR code on the page to encode the data in human readable for. Or just print pages of QR codes. I assume that data will be exported in XML or something similiar so the computer will now what the data means when it is restoring it from text. For compression or security, maybe consider creating a archive of the data with a password, then converting that to hexadecimal text file. That file can be converted in QR codes and printed. You of course trading security for security. The coded QR codes will not be human readable, but if any codes are messed up, then you lose the data. The plain text with qr codes are human readable, so not secure, but if a QR code is messed up, then it would be possible to get the data back from the plain text printout.
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
I know you're joking, but if you're trying to archive things like this, it's a good idea to include the documentation for how to generate and read the codes your using.
This is silly.
Without a computer he will have no need of this data. It's account data! What good is that without a computer? (Left unsaid is what kind of accounts we are talking about. If computer accounts you simply don't need it. If financial accounts you might temporarily need to work with paper till you restore the computers).
Why not a second or third or fourth backup at a different location all in common computer readable form?
Planning to scan in paper is far more complex than just a conventional backup on common media with a copy off site.
Sig Battery depleted. Reverting to safe mode.
I have already noted that laser prints can come off in flakes from the paper it's supposed to be attached to leaving unreadable text, and that's only after a few years.
Can come off, but in actual use doesn't.
Paper laying around loose, maybe. But l have laserjet printed output bound in binders since the first laser printers became available on common shelf storage which is not exhibiting any degradation over all these years. And, no, the paper isn't rotting out either, just to head off that old fud.
Sig Battery depleted. Reverting to safe mode.
Sadly, many of my old dot matrix and teletype printouts have faded as much as any other liquid ink I've used. It depends entirely on the ink in the ribbon. The liquid ink present in ordinary ribbons was often of wildly varying quality, and most people who bought those ribbons in bulk sought out the cheapest possible ribbons. I wouldn't bet on their longevity.
Laser printed pages consist of carbon in plastic, and there's no reason they shouldn't last a century or more, as long as certain conditions are met: if the toner is properly fused to the paper, if the paper doesn't degrade beneath it, and if the facing page doesn't adhere to the toner.
1. Your printer should have the right temperature set in the fuser, and that's probably not even adjustable to you. If the toner comes out dusty or smeary, it's too cool. If it comes out brown and crispy, it's too hot :-) You should recognize it immediately if the print quality is poor.
2. Store the paper properly. Heat is your enemy: don't let it get too hot, and don't store anything you want preserved in sunlight. Don't let it get damp - mold will destroy paper. Don't use crappy paper that will disintegrate - acid free is always the recommendation for long term storage. Horizontal stacks of paper will apply a lot of pressure to the sheets near the bottom of the stack, vertical hanging files reduce this pressure.
3. Watch out for printed sheets facing other printed sheets, (like double sided printing) where the toner from bottom side of the upper sheet can stick to the toner on the top side of the lower sheet. A horizontal stack of paper, especially in a hot environment, will apply a lot of pressure that cause the toners to fuse together where they touch. I've also had problems with toner adhering to vinyl sheets commonly found in 3 ring binders or binder covers.
John
This is silly.
Without a computer he will have no need of this data.
Agreed! Assuming he just wants to get back to 'where he was' on a computer, doing a massive printout and eventual OCR is lame. Besides, paper is flammable. So assuming that after whatever disaster (fire, flood, zombies...) you still can buy a new, working computer:
Just burn everything you want to an Archival Grade Gold DVD-R (rated to 100 years, I assume once you're dead you don't care) and keep it in a fire safe.
Once the 'disaster' happens, you just reload the data on to your new computer.
http://www.verbatim.com/subcat/optical-media/professional-optical/archival-grade-gold-dvd-r/
...post on Dan's Data already?
He covered most options available for what you want back in 2009, and apparently he did an update in 2011.
http://www.dansdata.com/gz094.htm
Mit der Dummheit kämpfen Götter selbst vergebens
I use two Luks-encrypted backups that I can take and leave at work. Becasue it is encrypted, I don't need to worry about it being compromised at work or in transit. I have two of these so that one is always out of my house.
Another location could be a safe-deposit box at a bank. Remember that, even if it is destroyed in a fire, it doesn't matter because the chance of all my copies and backups being destroyed at the same time is very, very low.
The real "Libtards" are the Libertarians!
Do a screen shot of it, overlay it with a picture of you and your girlfriend or boyfriend having sex and upload it to a revenge porn site, then publicly complain about it having been uploaded without your consent. That guarantees it will be available from any computer for at least 100 years.
For account numbers and passwords, this is a good solution. But IMO, it isn't a good enough solution. A better solution is print them twice. Put one copy in a waterproof, fireproof safe. Put the other copy in a safe deposit bank across town. This is to protect you from the possibility that your whole house and all your computers become inaccessible while you are away from home. (http://www.capitalbay.com/headline/339999-as-landslide-swallows-five-homes-in-wealthy-northern-california-neighborhood-residents-struggle-to-find-the-root-cause.html), (http://www.npr.org/blogs/thetwo-way/2013/07/07/199688745/runaway-train-explosion-still-ablaze-in-quebec).
And since you've got that safe deposit box, it's a great place to put original birth certificates, copies of insurance policies, property deeds, auto titles, and a SSD containing a backup of important data from your computer. A monthly trip to the bank to swap out your backup drive is also a good opportunity to check if your paper docs are up to date. If you don't have very much data that you think needs backing up, you can use a smaller, cheaper USB drive.
That would have been my preference too. If the data is un-encrypted, then you can read them with the Mk-1 human eyeball (takes a couple of hours practice, every day for a couple of weeks ; nothing drastic. Russian is harder to learn.) ; even if it's encrypted, you can transliterate from the paper tape to files on your new computer with the Mk-1 eyeball.
A tape reader is "nice to have", but not vital.
Tape has an advantage over punched cards that you only have one way to read it wrongly. But you can manage that risk perfectly adequately with punched card too, so that's not a deal-breaker. (I suspect that card readers have more moving parts than tape readers - all that card unstacking, moving and re-stacking - which would translate to a shorter lifetime.)
Someone suggested using plastic cards or tape ; I'd avoid those options. If your "fire safe" really is a fire safe, then paper should survive just fine while plastic may melt.
But again, the whole idea is fundamentally silly. If you really want the data to be secure, "disaster-recovery grade" backup is not exactly rocket science. Encrypt as desired. If you only want to do it with small amounts of data ("account information," whatever that means) then substitute SD cards, memory sticks or whatever floats your boats, but keep the data regularly refreshed. If you've really got to keep the data secure and usable for decades, then you need to go to "disaster-recovery grade" backup anyway, so just bite the bullet and pay for it. Then pass the cost to your customers. If they don't want to pay, then you probably don't want them as customers. This also applies if they're family.
I suppose it could be someone looking for a plot element in a "steam-punk" genre. That could be quite amusing. There may even be an RFC for that, similar to RFC 1149.
Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
It can actually be a risk, if the fuser doesn't get the toner hot enough, long enough, to fully infiltrate the paper(without burning it, obviously, which is presumably what drives conservatism on that score).
Very high humidity at print time can be a problem: It's rarely this dramatic; but I've seen a few cases where paper, left unattended and non climate controlled through a ghastly humid summer, to the point where it starts to get vaguely limp, billowing steam as it passed through the fuser stage. An interesting spectacle; but, needless to say, not good for adhesion(the characters themselves, while delicate, were largely intact, and could be poured off the paper), since the enthalpy of vaporization of water was sinking significant heat at point of contact. Sometimes the classier laser printers have humidity sensors in the print path to compensate; but air conditioning still isn't a bad idea, if only for the poor humans.
This, but instead of "overlay", do some real steganography.
Actually, fire safes are a lot better for paper than CD/DVD media, which will be destroyed faster than paper chars.
But when you see that at print time, why would anyone expect that to survive?
Steam coming out of your printer is a pretty significant clue if you ask me.
I have boxes of normal 20 pound office bond (nothing special) circa 1985 containing old listings. Its as crisp and intact as ever, and it got no special treatment, simply sitting in boxes on the shelf. I have continuous forms from old IBM mainframe 3800 printers that looks rattier. Probably the paper. But even these show no signs of print flaking off.
I've simply never seen print flaking off.
I've seen it wipe off with just finger pressure, but that was because the fuser roll had died and was no longer heating.
Sig Battery depleted. Reverting to safe mode.
Why not stamp the text into copper or aluminum tablets? Far less breakable than clay or stone. Copper eventually gets that green patina but it should still be readable if you stamp the words deep enough.
"It is a denial of justice not to stretch out a helping hand to the fallen; that is the common right of humanity."
Oh sure, this shouldn't be the common use case for backups. There's no reason it can't be a useful alternative. Personally, I am tempted to mail postcards covered in optar-printed labels all over the place, just to drive people nuts. Some of them would have to contain Goatse images, others, possibly random data.
Comment removed based on user account deletion
You could use standard SPI interfaced EEPROMs, they're generally rated >100 years longevity, 100k or more write/erase cycles.
Mind you they are fairly small, but certainly big enough to store account info, or keys, things like this. in DIP they only seem to come up to 1Mb (128kB), but SMT ones come in 64mbit (8MB).
Definitely an option for storing something like keys, etc, for long term, if you're a little handy with hardware (or if someone has made a handy EEPROM + USB-SPI adaptor on a stick).
There's a gazillion USB/SPI interfaces out there, the chips should work with any of them, pretty much... or you could use some computer with it in hardware - raspberry pi, beagle bone, etc.
Sent from my PDP-11
125F? Why 125F? Is that temperature magic, somehow? I mean: CDs in my car don't die in temperatures well above 125F on a hot sunny day with the windows closed... Even as a test, not long after I got a CD burner, I kept a few CDs on the rear deck of my car, unprotected, for a couple of years. They worked fine when I declared that the experiment was over.
Would a magnetic tape survive? Maybe, depending on the Curie point. But CDs aren't magnetic tape, and I strongly suspect that the 125F temperature is based upon the fragility of rust -- not compact discs or DVDs.
A friend of mine's house burned once. (It was a disaster for his belongings, but he and his family were fine, and the house recovered after a thorough gutting. Not so much luck for the dog and the fish.)
In his office was a wire rack of CDs in jewel cases: Data CDs, audio CDs, whatever CDs.
Many of these had the paper liner inserts badly degraded, and the jewel cases melted into strange shapes. The fire department had created a vent in his computer room, with the goal of exhausting the heat of the fire through that space: I have no idea how hot it was there, but just down the stairs from there the fire marshal said that things were hot enough for the couch to have burned by flashover alone.
The housings for his monitors and computers were limp and sagging.
The CDs themselves? Fine. All of them. A bit of soap and water to clean the soot off, and they worked great. Every single one of them worked fine. I started the process of duping them on my then-high-tech Plextor gear as an archival measure, only to realize that there was no need: There weren't any reported errors, and the reads were fast (indicative of a very low BER).
I stopped working on duping them, bought him some new jewel cases, and just cleaned up the rest and gave them back to him. No problems.
UL standards for fire safes designed to keep paper are 350 degrees F and rated in hours, and the glass transition temperature of polycarbonate is about 297 F. Simply cut the hour-rating in half, and you've got a conservative estimate for what it will take to keep CDs readable in a fire.
Proof? Easy. Fire up the oven, set it to an actual 290-or-so degrees, put a CD on the rack supported by the center ring, wait for awhile, and see if it's readable.
Anyone with a few extra blanks and a few hours time can test it easily enough.
Kid-proof tablet..
Eh?
I seem to have a vehement argument toward using CDs and perhaps DVDs stored in normal firesafes.
But more to the point: Maybe you should realize that this is Slashdot, where free thought is both encouraged and argued against using science and experience, and FUD is actively dismantled. I'd have thought, given your 5-digit UID, that you'd have understood that by now.
Maybe you should give your UID back to the person you bought it from?
Kid-proof tablet..
You can buy fire safes that are rated for 1, 2, ore 3 hours of fire which will maintain internal temperatures no hotter than 125F
This would work well with the archival DVDs, since the disk is made of Polycarbonate (thermoplastic polymers) that has a melting point of 311F (155C)
While it's true that paper will ignite up at 424.4F (218C) it starts to yellow and char starting down at 302F (150C) which could interfere with OCR.
There's no reason he couldn't burn an archival DVD and print a paper hard copy; then keep them both in the safe.
Why not stamp the text into copper or aluminum tablets? .
I stamp text into my tablet every day, all it leaves behind is fingerprints.....
---
Use crypto and only store the key. A key is small enough to be typed in without OCR. If your data is (correctly) crypted, there is no problem in leaving copies in the wild.
The only solution to protect data is duplication. There is no need for a safe.
For example, I would be really annoyed to lose all my digital pictures. There is a copy on the computer of my father (in another town). It is stored in a crypted (ecryptfs) directory because some pictures are personal. He does not have my password. I have also a backup of its data.
The shelf life is FAR longer than Slashdot nerds would have you believe.
No one specified a time frame here, certainly not the original story.
As far as I'm concerned, 100 years is more than adequate. Beyond that its someone elses problem.
The technology will change and people will have to move the data to another media well before then.
I've been burning optical media since about 1995. Back then a CD burner cost almost $2000 and the discs were $15 each.
I can say, with certainty, that well stored optical discs absolutely do NOT come anywhere close to meeting the shelf lives that are claimed by manufacturers today.
Of the gold discs I have from the mid 90's 100% of them are still readable, but beyond that, virtually every make and brand of media I've got has varying levels of failures up to about four or five years ago. So far I haven't had any fail since then. The failure rate approaches 100% for discs, regardless of brand, bought and burned between maybe nine and twelve years ago. I stopped burning CDs around that range of time, but my DVDs from that period have nearly as high failure rates, as well. I'd say the interim years its probably more like 10-20%, but it'll be five more years until I know if they start to fail at the same rate.
Keep in mind the warranty periods are based on two things -- the fact that virtually no one will ever file a claim for a replacement media, and the fact that the warranties explicitly do not cover losses of the data on the media. They can say 100 year shelf life because in five years if the media fails, no one is going to exchange it for a new version of a media they no longer use regularly, anyway.
The fact is, there's *no* single media durable enough for even mid-term storage at modern data densities. (And by mid-term, I mean "boy I'd like this pictures of my kids to still be readable when they get married" kind of range. Old megabyte-sized harddrives and old 80, 160, maybe 320KB floppies are largely still readable, if you can find the interfaces and hardware. Older low-density tapes are, too, but as I learned the hard way, if you don't write on the tape what software you used to record it, you're pretty much SOL if you want to read it in the future.
Effortless media-shifting is the only real solution these days -- keep copying them from one computer to another.