How the LHC Is Reviving Magnetic Tape
sandbagger writes "The Large Hadron Collider is the world's biggest science experiment. When spinning, it reportedly generates up to six gigs of data per second. Today's six-terabyte tape cartridges fill rapidly when you're creating that amount of material. The Economist reports that despite the advances in SSDs and hard drives, tape still seems to be the way to go when you need to store massive amounts of digital assets."
Of a station wagon loaded with tapes.
Also, -1, Duh, because this is an obvious, stupid article.
I want to delete my account but Slashdot doesn't allow it.
i remember a few years back backup to "cheap" disk was all the rage. if you were backing up to tape you were seen as some kind of mental patient
tape has its issues, but sucking up money like a trophy wife isn't one of them
I doubt that Amazon has to handle a similar amount of data than the LHC. However, the rest of your statement is correct.
I don't think "cheapness" is the problem being solved. More important for an organization like the LHC is archival reliability. Tapes can lost a long time while retaining their data integrity. I honestly doubt even high end hard drives can make that claim.
The world's burning. Moped Jesus spotted on I50. Details at 11.
disk based backup is cheaper for text type data files since it compresses very well. i tested disk backup on SQL Server for a year and while it did compress pretty good, not enough to make the disk cheaper.
its probably cheaper now that you can get a lot more storage per server than a few years ago, but i haven't run the numbers
It depends on what you're doing. Amazon has many thousands of unrelated chunks of data from different users spread all over that are accessed at random times, in random orders and in random chunks. They can also move chunks of data to all different places. The LHC however would probably prefer to keep its data all together, as it is likely to all be accessed in a considerably more sequential order. The lesson I get is that tape is still better *for large chunks of related data* while HDDs may be better for *large amounts of unrelated data*.
A gig is a performance, usually given by a band. It's a little known fact that the Higgs boson likes to rock out.
No one in the data retention business ever stopped using tapes. See the numbers on LTO units being sold, if you need proof.
This is a shitty article.
morcego
..has been greatly exaggerated lately by trade journals. There are some backup scenarios for which hard disc backup just isn't viable.
Viva la tape.
Look back up at my post, now look back down, you're on the Internet. Now look back up. I'm a signature.
A couple of years ago, Google restored lost gmail from tape. I'd expect that even with deduplication they must use a phenomonal amount of tape.
There are 8.5TB uncompressed capacity tapes in enterprise use right now. The 6TB compressed sounds like, oh, two generations back or so.
French - The lingua franca of Europe!
Yes. I think Amazon simply needs more automation. If a researcher needs to analyze some LHC data, some poor grad student can go rifling through mountains of tape. If an Amazon customer needs their Glacier data, Amazon would need to construct some sort of massive tape loading and library mechanism. I know these already exist commercially (e.g. StorageTek, etc.), but they are not cheap and they are probably not on the scale that Amazon would need.
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Tapes use data compression, too.
(In fact most of the "capacity" and "speed" numbers in the sales brochure are for compressed data).
No sig today...
People often confuse 'not currently sexy with companies less then 5 years old' with 'dead and or dying'.
Magnetic tape has been alive and well. Most big companies and research labs use it daily.
Sounds like the article writer knows nothing at all about corporate or industrial IT.
Do not look at laser with remaining good eye.
I would guess data generated in each run is stored to disk, then copied to tape for analysis elsewhere, freeing up disk space for the next experiment.
Time for bed, said Zebedee - boing
I'd love to see a Petabyte-Scale Tape Storage System that looked something like this, only modernized: http://youtu.be/Nq3mNYKR7FM
Sig: I stole this sig.
Or can I sell my cartridges on Craigslist now?
Mostly random stuff.
" When spinning it reportedly generates up to six gigs of data per second."
The LHC itself doesn't spin, rvrn though there are protons moving around the circular track at very near lightspeed. /pedant
For *reliable* backup and archive purposes tape never went out of style.
I've worked as a tape monkey in a large facility (Camp Foster RASC, Okinawa, circa 1989-90), so I know tapes do work well in the enterprise, but my experience with tapes in the consumer space in the 90s was anything but good. 90% of the tape backups made (using several different formats) using consumer-grade systems were corrupt and worthless.
We took great care with the tapes, but when we checked them (thankfully never needed them, except one occasion), they were mostly all bad.
Optical isn't much more reassuring as a backup media, given that optical discs tend to degrade over time.
If somebody has a tape system that can store terabytes on a cartridge, reliably, for say... $10/TB or less, and the system costs less than $200, I'd look at it, though. Otherwise, it is still more worthwhile just to use hard drives to back up data (even at their inflated prices)
1.) Tape is fast - your sata2 hdd will hardly be able to support a steady flow of data to an LTO5 drive (SAS 3/6gbs)
Disadvantage - no random access but that's not what tape is usefull
2.) proprietary - partly wrong if you want to use those vendor lock in products (cheaper drive - expensive cartridges)
LTO5 (and next LTO6) is downward compatible at least one version you can read data from your LTO5 tape with a LTO6 drive
3.) unreliable
in which way ? due to it's crc and sophisticated(develloped over decades) error correction tech
4.) idiots with money to burn buy one disk drive after another if they don't chose to invest more into the drive an be cheaper on the long run as the media price for (Example LTO5) is extremly low, especially if you find good unopened goods on ebay
Perhaps you got it, I'm a happy LTO5 (private, HD movie filmer) user and I occasionally look at ebay for cheap "10x Sony/Fujutsu/HP" disk packs, unopened, then I pay as low as 5€ per 1TByte, I don't need to buy new 4TB drives for backup, where the price per 1TB equals 40-45€
I think Amazon's system is a hybrid. There have been numerous articles about it but Amazon has kept their system tightly locked down with NDA's for all parties involved. However the reason why lots of people have come up with this conclusion is simple -- there are occasional but regular complaints you'll see on the internet where the 3-5 hour window is blown up to 10-24 hours. I suspect that they use commodity hard drives that are powered off once filled. But backing up those hard drives is a tape system that is only kicked in when they find a bad hard drive (and the tape backs up the hard drive). This way they don't use power when the drives are on.
I'm a fan of tapes too, partly because in the SMB space even the dumbest luser can change a tape, but changing out a disk drive on a Windows system *always* seems to be problematic.
Usually you're stuck with USB for ease of use, and even USB2 blows for throughput and I have yet to see a new server with USB3. And then there's the whole clusterfuck with drive letter assignments and the crummy job backup software does with identifying backup media vs. needing to write to some specific path (which is as much a Windows problem as anything).
Which makes me wonder why there isn't a SCSI storage peripheral that can use hard disks as removable media but looks to the server like a tape drive with some kind of translation to write to the disk. This lets you remove the whole disk management issue from the server to the peripheral disk host, as well as retaining tape compatibility with the backup software.
You could even get a little more exotic and put space for multiple disks in the peripheral and do various and sundry mirroring/RAID for redundancy and capacity.
Given the cost of LTO-5 and -6 in quantity, it's probably not cost effective over large quantities of tape, but I would think the peripheral itself would be cheaper (solid state, largely software) and more reliable, and for many use cases, possibly faster, since it's not always easy to maintain the streaming rates necessary to eliminate shoeshining with tape drives unless you're dumping a disk-based backup direct to tape.
My only big gripe with tape is drive reliability, they seem to die more easily than even individual drives in servers and SANs. My only other complaint is the legions of morons inisisting that cheap disk is always better than tape, making you look like a dinosaur for advocating tape.
disk backup does single instance storage where they can compress similar data across files. not just compress a file. i've seen over 90% compression in some cases. pretty amazing stuff, but limited in the type of data it works with
Math follows. You've been warned.
A typical storage array where I work has 192 3TB drives in it, more or less. We use SSDs in hybrid storage pools, but we'll ignore that for the time being as it doesn't meaningfully change the equation. Let's leave the hefty cost of the storage appliance out of this. Let's just look at electricity alone.
Each drive consumes about 8 watts, and must be spinning continuously in order to provide reasonable response times. That's 1,536 watts per rack, just to power the drives. Ignore the shelf power consumption, the heads for the NAS array, the PDU draw and loss... we're just talking very back-of-the-envelope stuff here.
Now let's ignore the cost of your tape silo, but I submit to back up half a petabyte requires a library somewhat cheaper than your NAS device above. Typical tapes hold 10TB apiece, and are written to once. Their power cost is largely ignorable; the tape library only consumes power for perhaps a small display and some internal LED lights, and significant wattage only while running the job, which we could take to tape over the course of about a week assuming sufficient drives. Let's assume worst-case for the tape drive to hard drive comparison -- that we're not using RAID or mirroring of any sort on the storage array -- and that we're actually backing up 576TB of data from that storage array. That means we require something like 58 tapes. Media cost is going to be something like 58 * $160/tape == $9,280 in media to back up that storage array.
Typical cost for electricity is 12 cents per kilowatt-hour. 1536 * 24 * 365 & $0.12 == $1,614.64 to run your storage array every year, just in drive wattage (and that's quite conservative; most good-quality 7200RPM to 15,000RPM drives run a watt or two higher than this).
So there's a highly-simplified breakdown for the cost of tape versus disk; the library pays for its media cost compared to disk in 6 years of usage. Is that worth it? That's a great question. For our needs, hard disk just can't keep up with the data rates we require, so it's a speed/throughput thing, not a cost thing. Tape seek time is horrible, so for any application requiring IOPS, hard disks win.
There are lots and lots of ways to look at this equation. I've priced it out on purchase orders dozens of times, and every time tape wins for archival needs. You just can't beat the flexibility it offers, particularly for disaster recovery and legal hold requirements. "Here's a FedEx package containing your encrypted backup tapes" is far more convenient and an easy sell than starting the conversation with, "First, let us install a storage appliance on your property and set up a WAN connection to our data center..."
Matthew P. Barnson
I learn what I think when I read what I write
Tape can do that, too.
Tape can be made to act like a great big disk, except some parts of it aren't available until the robot puts the correct tape in the drive.
No sig today...
Fact check on the troll.
"Tape is slow". Absolutely false for throughput; true only for IOPS. A modern tape is much faster than a modern hard drive. That's the point of the article, and my personal experience as well. Random I/O to/from tape drives is incredibly slow, but no hard drive can touch a modern tape drive's throughput. It's the reason LHC uses it.
"Tape is expensive": True only in a non-ROI sense, therefore mostly false. You'll find a modern, large tape silo of equivalent capacity to a modern, large storage appliance usually works out much cheaper both in initial cost and cost over time if you intend to use the hardware for at least three to five years. That said, the cost of admission to the world of enterprise tape is pretty high; it's the ongoing costs that are much lower than hard drives.
"Tape is Proprietary": Both true and false. LTO is an open (licensable) standard, but the fastest/largest tape drives on the planet are typically proprietary right now, because being the fastest/largest causes more sales, and therefore funds innovation in faster/larger tape technology.
"The only people who still use it are those who have to...": False. There are many, many use cases for tape where it is not a requirement, but is just more convenient, reliable, faster, and less expensive than a hard-disk solution. I could list them, but, well, you're a troll and I don't want to type much more.
"The only people who still use it are... [those] with money to burn.": False. ROI is what drives most of our tape purchases, and we save an enormous amount of money by using tape in appropriate scenarios. Hard disks are appropriate for some use cases, tapes are mandatory or just a smart purchase in others.
Matthew P. Barnson
I learn what I think when I read what I write
Late 2013 pricing.
4TB hard drive: around $400
5TB tape: around $160
8.5TB tape (same media as 5TB, newer drive): still about $160
Cost per terabyte of disk: about $100.
Cost per terabyte of tape: about $19
I'm ignoring the cost of the tape drive, just like I'm ignoring the cost of the head(s) involved in NAS/SAN storage.
To fix your quote to be in line with reality:
Matthew P. Barnson
I learn what I think when I read what I write
Good stuff. The only place you lost me was "For our needs, hard disk just can't keep up with the data rates we require". Really? Your first-tier storage is hard drive based, right? Tape is only a backup. How can hard disk not be fast enough? If you mean streaming naively to a single spindle, I guess I can dig it, but that's not my idea of hard drive archiving.
The big STK libraries are "medium scale" tape storage. Large scale uses a warehouse full of tapes, and humans or warehouse robotics to find tapes in the stacks and bring them to the front where the tape drives are. Latency goes up that way, of course, but it's quite economical.
The large scale tape approach has much higher storage density than a datacenter, with very low power and cooling requirements. This approach pre-dates the disk drive, and is very mature and boring by operations standards. It's also by far the cheapest approach.
Socialism: a lie told by totalitarians and believed by fools.
I imagine that kind of storage would work for an Iron Mountain type operation, where you can easily consolidate a single client's data. For Amazon, this approach would be messier - a single client would have data scattered all over a huge number of tapes. I suspect the number of tape drives (and robotic or human feeders) necessary to handle this situation would not work well. I suppose they could do some intelligent caching and some off-peak consolidation - but in the end, I suspect tape loses a lot of it's cost advantages.
Still, I'd bet they use tape somewhere in their operation :)
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
I tend to agree, though it depends how big the average client's weekly/monthly Glacier upload is. Typically for backup you pool to disk in some way in the short term, sort the data out in the way that makes sense for your needs, then write to tape. They say they don't use tape for Glacier, and I believe it.
OTOH, for their internal operations I bet they use tape as a backup for their backup, and don't give much thought to how they'd ever recover at scale. While I think tape is great for long-term archiving (as makes sense for scientific data), it's not a great answer if you lose a whole datacenter.
Socialism: a lie told by totalitarians and believed by fools.
We do massive data dumps on a regular basis, and I was typing quickly. I probably should have said, "For our needs, hard disks are extremely inconvenient and their throughput is too slow individually to suit.". Good catch.
Matthew P. Barnson
I learn what I think when I read what I write
What's missing from your comparison is Scale. For small-scale solutions such as you suggest -- 16TB is TEENY, TINY STORAGE -- I absolutely would advocate disk-to-disk kind of stuff. Cheap, fast, easy. Sync it over the cloud. It's small. 16TB is just statistical noise from an enterprise storage perspective. Tape is pretty much mandatory when you need to figure out how to deal with a few hundred petabytes... not a few dozen terabytes.
Matthew P. Barnson
I learn what I think when I read what I write
so call it 16U to have all of that data online with under 5 second avg access
But it also takes only 5 seconds to delete all your data.
That's the problem we are experiencing at the office right now. We have been archiving to tape for quite sometime when we were starting with LTO3. Now we are at LTO5 (always one generation behind so the cost will be cheaper.)
The problem is backup speed. Our data are incompressible data (video, pictures) so we do not gain from the very high published backup rates. Our arrays are high speed hundreds of megabytes for streaming uncompressed video (even this is not compressible by the tape, which is very odd.) With terabytes of data generated, it is hard to keep up with backup. Our data is regularly restored because of access to archival storage. This creates data management challenges as well. Our main problem is the very long time to backup and restore TBs worth of data on a daily basis. Though it would be easier to scale by adding more tape libraries, but it is not cost effective to keep on adding (as well as adding more arrays to handle streaming read and write operations at the same time.) We are also using LTFS which automated backup software are not friendly too. Our requirement is different from the enterprise backup of multiplexing data from different servers at the same time to get speed. We backup projects one at a time on a tape (self contained.)
LTO6 does not go faster much from LTO5 speeds (160MB vs 120MB for uncompressed.) It is likely that the tape is reaching its limit (much like harddrive speeds have not grown with capacity increases over the years.) SSDs are faster but not effective in capacity wise though. So time to look for new technologies in storing and accessing data. In all, storage has not kept up with the performance improvements in CPU, memory, and other bandwidth links (Ethernet, fibre channel, etc.) We should be transferring at the 10GB/s range already at this time.
Live your life each day as if it was your last.
indeed, every single employer I'll had in the past 30+ years has used tape, starting with 40MB on a 2400 foot 9 track at 1600 bpi, now we have over 60,000 times the capacity with LTO 6.
You didn't look too hard at the ODA specs. For starters, everyone here is talking AT LEAST 100 megaBYTES per second of bandwidth on and off the media SUSTAINED.
Now look at Phony's ODA: 35-50 megaBITS per second--MAX (it is a disk, after all). Connection is USB-3. Target machines are winblows and mac, no mention of Linux or any kind of server environment at all.
Time to fill a full 1.5TB 12 disk cartridge: 48 hours (2 days) at 50 Mbps, 72 hours (3 days) at 35 Mbps.
It was a joke when it was introduced, and an even bigger joke now: http://hardware.slashdot.org/story/12/04/16/1924248/30-blu-ray-discs-in-a-15tb-minidisc-like-cassette
Not funny enough? Here is more hilarity (the prices): http://pro.sony.com/bbsc/ssr/cat-datastorage/cat-opticaldiscarchive/
Replying to myself: as if the drive prices weren't expensive enough, the prices for media are totally, well, consistent with Sony:
1.2TB rewritable $270 from B&H Photo: http://www.bhphotovideo.com/c/product/1010742-REG/sony_odc1200re_archive_cartridge_1_2tb_rewritable.html
1.5TB WORM $280 from B&H Photo: http://www.bhphotovideo.com/c/product/983354-REG/sony_odc1500r_archive_cartridge_1500gb_write.html
And to top it all off, here's the obligatory DRM:
To help content creation professionals manage their metadata and improve workflow efficiency, Sony has developed the Optical Disc Archive Content Manager, which is a software application (license) bundled with each drive.
A 4TB 5900 RPM SATA drive, sure. Check SAS prices. There are many reasons why the 7200RPM SAS drives are much more expensive that I won't go into here...
Matthew P. Barnson
I learn what I think when I read what I write
"The Large Hadron Collider is the world's biggest science experiment. When spinning, it reportedly generates up to six gigs of data per second. Today's six-terabyte tape cartridges fill rapidly when you're creating that amount of material. The Economist reports that despite the advances in SSDs and hard drives, tape still seems to be the way to go when you need to store massive amounts of digital assets."
I don't think that the LHC spins.
My name's Matthew Barnson. I'm happy to talk storage and tape technologies any time, and am pretty certain I'm not a pathological liar. But, you know, I could be lying about that. I live in Utah, and work in a pretty large data center nearby. It's my job to know what I'm talking about, and I've lived and breathed this stuff for a number of years. That said, I can always be mistaken.
Nice to meet you, Anonymous Coward. Feel free to send me an email (firstname@lastname.org) and we can talk use cases where tape is the obvious and better choice, and those where disk is the obvious and better choice. I'm a storage and backup admin working in the industry for nearly twenty years, and have had discussions similar to this over coffee tables, water coolers, and in board rooms. The discussions end up being about things like performance, ROI, archival needs, reliability, typical use case, auditability, and more. Depending on which angle you look at it, some technologies win and others lose.
The point of THIS discussion was some writer who assumed tape was dead learned otherwise. I allege tape is not dead, and has never been over the past six decades, for numerous good reasons (and some bad ones). That said, I have no particular attachment to it other than that it is often the right solution for enterprise needs when other solutions -- like finicky, unreliable optical media -- will not do.
Anyway, if you want to argue about raw vs. compressed capacity, that's fine. We compress data on our ZFS storage appliances because it improves performance, not just capacity. Same with tape. I routinely shove more than 10GB of uncompressed data at the 5TB at my T10K T2 tapes, and seamlessly/transparently pull 10GB of uncompressed data off of them. The fact it was compressed in between is relevant, perhaps, but what's also relevant is that we usually fit in excess of 10TB of data per tape. If you're willing to play by real names, I can provide some stats to back up the claim that most modern tape drives easily and typically achieve their rated compressed capacity figures.
We see that with LZJB compression on our storage appliances as well: about 1.7 to 2.4:1 compression, on average. It varies by what you're storing, of course. Our patch repository, for instance, sees pretty terrible compression ratios as it's trying to compress gzipped and zipped data. On the other hand, general-purpose file storage can see considerably better results.
I maintain that tape is a key sell for customers who audit us regularly. The fact that data is stored on tape, shipped to a secure facility for storage in an EM-resistant container and cage, and retained for a specific period is a revenue driver in the post-9/11, Sarbanes-Oxley, HIPAA era. I have to provide evidence on this to auditors regularly. Among other things, customers who care about their data often aren't satisfied with many pure on-disk solutions: they want data guarantees of timeliness, throughput, encryption and the keys for decryption, and timely windows for restoration of data in case of disaster or "oops". Yet these same customers often aren't willing to pay what it costs to have a fully redundant, disaster-tolerant environment that could weather another 9/11 and come up in an alternate location instantly. In that great land of the "in between" is one gigantic area where tape shines at a reasonable cost.
Tape has its share of problems, to be sure. But there are many cases where it is simply the best solution, providing a solution to common data transport and archival challenges like it has for the past sixty years.
Matthew P. Barnson
I learn what I think when I read what I write
ODS-D77U specs are 780 Mbps write once and 1.15 Gbps read. This is in the same neighborhood as LTO-6. But yeah, the ODA drive price is 3 times higher.
The other speed issue to look at is seek time to recover files, which is going to be much longer on tape (often a minute for LTO-6) than disk. The value of low seek time will depend on use case.
I'm not sure where you get 35-50 Mbps from - you may be confusing ODA with Sony XDCAM, which is an older, single disk system.
Highspeed GPU accelerated and hardware accelerated compressors exist for cold storage systems
It is funny, that on one hand you (or who knows, maybe another anonymous coward) use the cheapest consumer HDD prices you found at the cheapest places in your examples, and on the other hand you continuously use extra or not even existing future hardware when you talk about features.
Several other people in the thread also mentioned you can pick up 4TB SATA drives for around $150. I was referring to 4TB SAS drives which retail for more like $350-$450 as of this writing. It's a more apples-to-apples comparison; even though SAS is several orders of magnitude worse than tape for bit errors, it's an order of magnitude better than SATA.
Matthew P. Barnson
I learn what I think when I read what I write
Some anectodal evidence: Five years ago 2TB was the highest capacity drives. I bought 3 pieces of 1.5TB drives, so not the biggest, but next to it. They were the cheapest drives of the my usual manufacturer. One almost immediately failed. The replacement drive also failed within a year (different type, but same manufacturer). A third drive is still working but after a power loss a year ago quite a few bad sectors appeared on it, some data was lost. All in all only one from the four had no issues within 5 years.
On the other hand, I had no problems with many other drives over the years from the same manufacturer, which were also SATA drives, but medium capacity, relatively more expensive models with longer guarantee.
For me the lesson is that I should not buy the highest capacity drives of their generation, because the technology may not be mature enough at that point.
My point is that if we have no OFFLINE backup, then a physical or network attack can destroy both our live data and our online backups at the same time. If I were an attacker, and I would really like to destroy a firm, then I would first target their backup system. If I can delete all backups immediately thats the best. If not, I would slowly poison their data, so their backups become useless. Only after that I would destroy live data. Therefore it is not enough if you have one offline backup, you must have several one, recorded at different times.
We do use replication, and we have standby servers. Those are useful for high availability. But that is not backup.
We also used offline disks for backups, but I find that inconvenient, and the backup software we use supports tapes much better than disks. I also do not trust disks for long term storage, see my other comment about this.
Until now we were the subject of targeted hacking attempts a few times every year, and they become more sophisticated as the time goes on. I am quite happy here, I want to keep my workpace safe.