How the LHC Is Reviving Magnetic Tape
sandbagger writes "The Large Hadron Collider is the world's biggest science experiment. When spinning, it reportedly generates up to six gigs of data per second. Today's six-terabyte tape cartridges fill rapidly when you're creating that amount of material. The Economist reports that despite the advances in SSDs and hard drives, tape still seems to be the way to go when you need to store massive amounts of digital assets."
Just be careful - optical disks degrade, too. Years ago before hard drives became so incredibly dirt cheap, I would do my little video editing thing and then back up the project files to DVD. And not just any DVD - I did my homework and found the best-rated archival DVDs (sorry, don't remember the brand - only that they came from Japan). Anyway, I just sucked them back onto my NAS, and some of them had developed a teeny bit of unreadable data. Fortunately, I had made PAR2 files for everything. Between par2repair and ddrescue, I was able to recover the data. But the moral of the story is don't rely on optical disks to be magical storage that does not degrade.
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Yes, they are surprisingly fast. The maximum speed of a current Tandberg LTO-6 drive is 160 megabytes/s if the data is uncompressable. With the usual compressible data it can be about 320 megabytes/s (officially 400).
These drives can even be too fast. The drives do speed matching, but they have a minimum speed, below that they start shoe-shining. One reason I have chosen an older generation, LTO-3 tape drive, instead of the current generation, because I cannot easily feed an LTO-6 with at least 60 MB/s, which is the minimum speed of the drive. Considering compression, that is about 120 MB/s, which saturates a 1Gb network.
The bottom line in managing long-term archiving (5+ years) is that you need to both refresh and verify you storage, at several different levels.
1. Shoot the initial copy.
2. Copy this asap. "Copy1"
3. Stash both in disparate locations.
4. Go back to the 'original' on a 6-9 month schedule and verify it.
5. Go back to the 'copy1' on a schedule and verify it on a different schedule.
6. Go back to the 'original' on a different 9-12 month schedule and refresh(copy) it, stored to the other site.
7. Go back to the 'copy1' on a different schedule and refresh (copy) it, stored to the other site.
8. Repeat 4&5 on a year schedule. Do you need to re-write the data in 'current' formats and retain both original and new? Are you moving to new media?
9. Repeat 6&7 on a year schedule. Ditto the rest of step 8.
10. We should be at year 2 or 2.5. Repeat steps 1-9 once for a 6+/- year retention, again for 10+ year retention.
Are you changing data formats, and is it possible to ensure integrity by copy8ing and archiving in new formats?
As you change media, do you need to retain old media systems, or will you move to the new media?
At what point is the data no longer valid, determined by the owners?
Are the 'owners' the only stakeholders? If not, expand the set.
In all of this, you have a dedicated media management system including media drives, copy/verify capabilities, and stand-in for restoration.
This is all very interesting to me. Medical records in particular seem to be assumed to have a lifetime retention, but other than the date and nature of the event, how important are the details of your appendectomy performed at age 5 when you are 60? Is that benign tumor removed at age 12 important at age 45? How much LHC data collected in 2013 will be useful in 2023? Different criteria. Different processes.
deleting the extra space after periods so i can stay relevant, yeah.
Agreed. At my work we do parallel streams to multiple Sun T10000 T2 tapes (T10K "C" drives) at 250Mbyte/sec uncompressed (500 megabytes per second compressed, more or less, usually quite a bit more). If for some reason we push less than about 120mbytes/sec, the tape rewind times cause all kinds of issues.
We make the same kind of decision when choosing Sun T10000 "B" drives instead of "C" or the new "D" drives if the source cannot push data fast enough.
I've long laughed at articles saying tape is dead. For large-scale* backup, retention, transport, and legal hold problems, there simply is no other solution that scales reasonably well.
*My definition of "large-scale" for this specific context: hundreds of terabytes or more, much of it transported thousands of miles regularly. If you don't work with hundreds of terabytes and at least dozens of petabytes on a daily basis, you may suffer from optimistic delusions regarding disk storage capabilities, one which disk storage vendors are all too glad to reinforce, to the detriment of customers faced with half-baked solutions that cannot hope to meet their throughput requirements. Given "large-scale" data, there's no replacement for tape at present; everything else is a low-throughput also-ran, typically harboring enormous and unplanned complications. We're also heavy users of VTL, replication, cloning, S3-workalikes, and various disk technologies. Tape remains vital to large enterprise operations, and those predicting its imminent death have been the butt of jokes about marketing wonks for a decade and a half.
Matthew P. Barnson
I learn what I think when I read what I write
Sequential access speed is only relevant if you backup huge non-fragmented files or entire raw partitions, and nothing else.
You came pretty close with the process, but for most businesses you're not quite there. Here are a few clarifications on the process.
I do this kind of thing all the time. Feel free to ping me at my easily-figured-out email address (firstname@lastname.org) if I can answer additional questions for you.
Matthew P. Barnson
I learn what I think when I read what I write