Slashdot Mirror


How the LHC Is Reviving Magnetic Tape

sandbagger writes "The Large Hadron Collider is the world's biggest science experiment. When spinning, it reportedly generates up to six gigs of data per second. Today's six-terabyte tape cartridges fill rapidly when you're creating that amount of material. The Economist reports that despite the advances in SSDs and hard drives, tape still seems to be the way to go when you need to store massive amounts of digital assets."

11 of 267 comments (clear)

  1. Never underestimate the bandwidth by Gothmolly · · Score: 4, Funny

    Of a station wagon loaded with tapes.

    Also, -1, Duh, because this is an obvious, stupid article.

    --
    I want to delete my account but Slashdot doesn't allow it.
    1. Re:Never underestimate the bandwidth by Isca · · Score: 4, Interesting

      Actually I found the article informative. I knew tapes were the cheapest and most cost effective backup solution but I didn't realize that they were so fast once the tap has been loaded.

      It's also interesting to see the advances in tape reading technology that they are striving for - it sounds as if it will keep pace with HD and SSD technology to keep staying relevant.

    2. Re:Never underestimate the bandwidth by SuricouRaven · · Score: 5, Interesting

      Cheapest, sort of.

      The price of storage roughly follows the y=mx+c linear graph: m is the cost of the media, while c is the cost of the equipment needed to access it.

      For hard drives, it's easy: c=0. A drive is self-contained.

      For tape, c is large (Up to several thousand pounds for one tape drive), but m is smaller (Tape, purchased in bulk, is cheap).

      So if you're storing a small amount of data, a rack full of hard drives is cheaper. For larger amounts, tape is cheaper.

      This ignores issues of ease of access and management software.

    3. Re:Never underestimate the bandwidth by Dachannien · · Score: 5, Funny

      Why use a Station Wagon? Why not a 747?

      When's the last time you saw a 747 with that totally swank wood trim on the outside?

    4. Re:Never underestimate the bandwidth by dshk · · Score: 4, Informative

      Yes, they are surprisingly fast. The maximum speed of a current Tandberg LTO-6 drive is 160 megabytes/s if the data is uncompressable. With the usual compressible data it can be about 320 megabytes/s (officially 400).

      These drives can even be too fast. The drives do speed matching, but they have a minimum speed, below that they start shoe-shining. One reason I have chosen an older generation, LTO-3 tape drive, instead of the current generation, because I cannot easily feed an LTO-6 with at least 60 MB/s, which is the minimum speed of the drive. Considering compression, that is about 120 MB/s, which saturates a 1Gb network.

    5. Re:Never underestimate the bandwidth by TooMuchToDo · · Score: 4, Interesting

      I used to work on data taking for the CMS detector at the LHC. We were using Storagetek tape silos [http://computing.fnal.gov/cdtracks/2009/january/images/robot.jpg] for long-term storage of data at Tier1.

      Tape allows for cheaper storage and large capacities, but you're then fighting contention issues (there are only so many robotic arms and tape drives for your tape library) as well as having data on tapes go bad without knowing it. When data is on disk, I can at least verify it immediately. Bit rot is definitely alive and well on tape.

    6. Re:Never underestimate the bandwidth by Doc+Hopper · · Score: 4, Informative

      The drives do speed matching, but they have a minimum speed, below that they start shoe-shining.

      Agreed. At my work we do parallel streams to multiple Sun T10000 T2 tapes (T10K "C" drives) at 250Mbyte/sec uncompressed (500 megabytes per second compressed, more or less, usually quite a bit more). If for some reason we push less than about 120mbytes/sec, the tape rewind times cause all kinds of issues.

      We make the same kind of decision when choosing Sun T10000 "B" drives instead of "C" or the new "D" drives if the source cannot push data fast enough.

      I've long laughed at articles saying tape is dead. For large-scale* backup, retention, transport, and legal hold problems, there simply is no other solution that scales reasonably well.

      *My definition of "large-scale" for this specific context: hundreds of terabytes or more, much of it transported thousands of miles regularly. If you don't work with hundreds of terabytes and at least dozens of petabytes on a daily basis, you may suffer from optimistic delusions regarding disk storage capabilities, one which disk storage vendors are all too glad to reinforce, to the detriment of customers faced with half-baked solutions that cannot hope to meet their throughput requirements. Given "large-scale" data, there's no replacement for tape at present; everything else is a low-throughput also-ran, typically harboring enormous and unplanned complications. We're also heavy users of VTL, replication, cloning, S3-workalikes, and various disk technologies. Tape remains vital to large enterprise operations, and those predicting its imminent death have been the butt of jokes about marketing wonks for a decade and a half.

    7. Re:Never underestimate the bandwidth by dshk · · Score: 5, Informative

      Sequential access speed is only relevant if you backup huge non-fragmented files or entire raw partitions, and nothing else.

  2. Re:but what about cheap disk? by MightyYar · · Score: 4, Informative

    Just be careful - optical disks degrade, too. Years ago before hard drives became so incredibly dirt cheap, I would do my little video editing thing and then back up the project files to DVD. And not just any DVD - I did my homework and found the best-rated archival DVDs (sorry, don't remember the brand - only that they came from Japan). Anyway, I just sucked them back onto my NAS, and some of them had developed a teeny bit of unreadable data. Fortunately, I had made PAR2 files for everything. Between par2repair and ddrescue, I was able to recover the data. But the moral of the story is don't rely on optical disks to be magical storage that does not degrade.

    --
    W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
  3. Re:but what about cheap disk? by rickb928 · · Score: 4, Informative

    The bottom line in managing long-term archiving (5+ years) is that you need to both refresh and verify you storage, at several different levels.

    1. Shoot the initial copy.
    2. Copy this asap. "Copy1"
    3. Stash both in disparate locations.
    4. Go back to the 'original' on a 6-9 month schedule and verify it.
    5. Go back to the 'copy1' on a schedule and verify it on a different schedule.
    6. Go back to the 'original' on a different 9-12 month schedule and refresh(copy) it, stored to the other site.
    7. Go back to the 'copy1' on a different schedule and refresh (copy) it, stored to the other site.
    8. Repeat 4&5 on a year schedule. Do you need to re-write the data in 'current' formats and retain both original and new? Are you moving to new media?
    9. Repeat 6&7 on a year schedule. Ditto the rest of step 8.
    10. We should be at year 2 or 2.5. Repeat steps 1-9 once for a 6+/- year retention, again for 10+ year retention.

    Are you changing data formats, and is it possible to ensure integrity by copy8ing and archiving in new formats?
    As you change media, do you need to retain old media systems, or will you move to the new media?
    At what point is the data no longer valid, determined by the owners?
    Are the 'owners' the only stakeholders? If not, expand the set.

    In all of this, you have a dedicated media management system including media drives, copy/verify capabilities, and stand-in for restoration.

    This is all very interesting to me. Medical records in particular seem to be assumed to have a lifetime retention, but other than the date and nature of the event, how important are the details of your appendectomy performed at age 5 when you are 60? Is that benign tumor removed at age 12 important at age 45? How much LHC data collected in 2013 will be useful in 2023? Different criteria. Different processes.

    --
    deleting the extra space after periods so i can stay relevant, yeah.
  4. Re:but what about cheap disk? by Doc+Hopper · · Score: 4, Informative

    ...you need to both refresh and verify you storage...

    You came pretty close with the process, but for most businesses you're not quite there. Here are a few clarifications on the process.

    1. Typically large companies (including those, like us, with stringent HIPAA requirements) take two simultaneous copies from the original source. We don't copy a copy if it can be avoided, and we have enough tape drives to do this.
    2. We contract out with a local storage company to grab the tapes within a few days and store for the given retention period off-site. One copy usually remains on-site as well for long-term retention and rapid restoration. With plenty of capacity in the silo (tens of thousands of tapes in an Oracle/Sun SL8500), we are not terribly concerned about retention policies. If we get tight on space, we'll just expand the silo again.
    3. The same data usually still exists as on-disk media marked read-only, available for the legal folks who insisted we archive it in the first place. Often it also exists at a second geographical location thousands of miles (at minimum) from the first, with its own backup tapes. Plus it exists on two tapes at each site, one near-line and one off-site. Given tape reliability, three layers of data protection is typically sufficient. If "legal hold" is involved, we also insist that the disk array be kept on a valid support contract to reduce the risk of failed disks in the storage appliance.
    4. Retention policies dictate we keep around at least a few tape drives of every generation we've ever used which has tapes archived with our off-site storage facility. Even if they are not in the silo, they're in a storage closet waiting for us to bring them to life if needed up to twenty years later.

    I do this kind of thing all the time. Feel free to ping me at my easily-figured-out email address (firstname@lastname.org) if I can answer additional questions for you.