Slashdot Mirror


Amazon Wants To Replace Tape With Slow But Cheap Off-Site "Glacier" Storage

Nerval's Lobster writes with a piece at SlashCloud that says "Amazon is expanding its reach into the low-cost, high-durability archival storage market with the newly announced Glacier. While Glacier allows companies to transfer their data-archiving duties to the cloud — a potentially money-saving boon for many a budget-squeezed organization—the service comes with some caveats. Its cost structure and slow speed of data retrieval make it best suited for data that needs to be accessed infrequently, such as years-old legal records and research data. If that sounds quite a bit like Amazon Simple Storage Service, otherwise known as Amazon S3, you'd be correct. Both Amazon S3 and Glacier have been designed to store and retrieve data from anywhere with a Web connection. However, Amazon S3 — 'designed to make Web-scale computing easier for developers,' according to the company — is meant for rapid data retrieval; contrast that with a Glacier data-retrieval request (referred to as a 'job'), where it can take between 3 and 5 hours before it's ready for downloading."

26 of 187 comments (clear)

  1. still to expensive for me by alen · · Score: 4, Informative

    my company pays for offsite storage of our tapes and i did some quick math

    $2000 a month to store over 1000 tapes for us. I think the minimum bill is like $1500 if you only have a few tapes

    $.01/GB is $10 to $20 per LTO-4 tape per month. i know the specs are less but ive seen LTO-4 tapes hold close to 4GB of data.
    i send out one tape per month for storage and keep a bunch more locally. so even on the cheap end that's $240 per month for the first year.

    1. Re:still to expensive for me by alen · · Score: 3, Informative

      yep

      specs say 1.6TB max compressed but i've seen my tapes hold 3TB and 4TB. LTO-5 is even better but too expensive.

      PHB is always complaining about the cost of our off site storage so this made me look at it right away. and LTO4 is fast if you have decent server hardware

    2. Re:still to expensive for me by l0ungeb0y · · Score: 4, Informative

      The cost for Glacier Storage is $10 per Terabyte per month. Not sure why you are saying it's $10 - $20 per 4GB, perhaps you meant 4TB, I'm not familiar with LTO Tapes. If you are storing about 4TB of data, that would be $40/month for Glacier. However, reading back data will incur costs of $10 per Terabyte retrieved.

      I probably would never use Glacier for storing internal document records, but for safely archiving DB records/snapshots and usage logs from services running on an EC2 instance after running them through analytics and aggregation, it seems like an excellent service.

    3. Re:still to expensive for me by Trepidity · · Score: 4, Insightful

      Yeah, I don't think this is competitive with tape robots for large operations. I see it as gaining inroads, at least at the current price point, among customers who don't have that kind of equipment onsite, so would be otherwise using regular backup services for their archival needs. By adding Glacier to the existing S3 service, as a cheaper but higher-latency storage option for stuff that you're keeping "just in case" (lawsuit/whatever) as opposed to for likely access, Amazon basically incrementally expands the range of use-cases they're competitive in.

    4. Re:still to expensive for me by wvmarle · · Score: 4, Insightful

      I think your organisation is too big for Glacier.

      When you're big enough, it usually pays off to do stuff in-house, as you have economy of scale.

      Everyone smaller than that, is struggling to do proper back-ups. I for one, have something like 50 GB of data to backup. Way too small for tape. It's HD size. But HDs are not exaclty suitable to drop in a tote bag and take home on the train. Also they're a bit expensive to have a new HD every week/month so you have to rotate, making the transport even worse. I've looked into using memory cards or USB sticks, but I need 64GB ones which are still very expensive. A service like this I should seriously look into (especially now I have a 20 Mbit up/down Internet connection).

      Privacy remains an issue of course.

    5. Re:still to expensive for me by mlts · · Score: 3, Informative

      At the 50GB level, that is where this service becomes useful. For maximum security, I'd create a TrueCrypt volume, stuff all the stuff needing to go into the archive into it, gpg sign the volume, and upload the volume and its signature. That would mean 50 cents a month indefinitely, but at the minimum, if the upload is successful, Amazon would be storing the data on a SAN with at least RAID 5 or 6 on the backend.

      Of course, with a Blu-Ray burner, I can spend a couple bucks and burn the data onto BD-R media to store indefinitely.

      For business critical data, perhaps the best thing would be both burning a local copy to optical media, then uploading a TC container to AWS. This allows recovery in a lot more circumstances. This way, one doesn't need to sit there waiting for stuff to get readied, then download, but if there are no working local copies, the data is still accessible.

    6. Re:still to expensive for me by hawguy · · Score: 3, Interesting

      my company pays for offsite storage of our tapes and i did some quick math

      $2000 a month to store over 1000 tapes for us. I think the minimum bill is like $1500 if you only have a few tapes

      $.01/GB is $10 to $20 per LTO-4 tape per month. i know the specs are less but ive seen LTO-4 tapes hold close to 4GB of data.
      i send out one tape per month for storage and keep a bunch more locally. so even on the cheap end that's $240 per month for the first year.

      Compress your data before you send it to Amazon and you'll have a more fair comparison. An LTO-4 tape holds 800GB native, so your thousand tapes is 800TB of data, which would cost you $8000/month on Amazon Glacier.

      If you store multiple copies of your data (to protect against tape failure) and could get by with only 200TB of Glacier space, then it might be cost effective, lower labor costs in loading tapes and shipping them offsite, and dropping maintenance on your tape library (or libraries) may also sway the decision.

      The numbers change for LTO-5 (1.5TB native), but then you're looking at a large capital cost to swap out your tapes and upgrade your tape drives.

      I'm in a little different situation - I have my data replicated to a colocated storage array with less than 100TB of data. Amazon Glacier storage would cost about the same as I pay in maintenance on the array (ignoring colocation fees). Glacier is not a drop-in replacement for the array, since the storage array also runs my DR VMware cluster, but it may be more cost effective to get rid of the colocated array cabinet and VMware cluster hardware and rent some VM's with a small amount of storage for the critical servers I need for disaster recovery, using Glacier to store the rest of my data.

    7. Re:still to expensive for me by hawguy · · Score: 3, Interesting

      Centon DataStick Pro 64gb is about 35$ each. I bet if you buy 50 of them, they are cheaper. Get a good fire safe, and store one on site, one off site.

      You forgot to include labor costds to pay someone to plug them into the backup server, swap them out, ship them offsite, and keep track of them.

      But even if you exclude labor costs:

      50 of those memory sticks cost $1750, if you split them between offsite and onsite, and have 2 copies of the data on each set, that's gives you 768GB of storage (50 / 2 / 2 * 64), which would cost about $8/month on Glacier, so you could store that data for more than 15 years for what it costs you to buy the memory sticks.

    8. Re:still to expensive for me by CastrTroy · · Score: 3, Informative

      Yeah, for any appreciable amount of data, it's going to be quite time consuming to transfer the data. It's not unheard of to run a website off a 10 Mbit line, but transferring 50 GB over a 10 Mbit line is going to take over 113 hours. So if you have to backup 50 GB a day, it's impossible. If you have a 100 mbps line, you're down to 11 hours of saturating your line, just to transfer out the 50 GB of data. Unless your data center has some kind of peering agreement with Amazon where they can give you a really fast unmetered line, I don't really see this working out all that well.

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
  2. And simple by smittyoneeach · · Score: 5, Funny

    Walkabout the glacier
    With stubble on the face. You're
    Returning to a place sure
    To need a smoother face, pure.
    Burma Shave

    --
    Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
    1. Re:And simple by Arkham · · Score: 3, Informative
      --
      - Vincit qui patitur.
  3. Re:Welcome to teh FailBoat, Amazon. by alen · · Score: 4, Informative

    what about 5 year old billing records for a customer/partner inquiry or lawsuit. i've had to compile those and a 2 week wait was OK in almost every case

  4. Re:Welcome to teh FailBoat, Amazon. by Trepidity · · Score: 3, Insightful

    I believe this is intended for archival data that is unlikely to be needed, especially not in full, not operational data that you might need to do a full restore from. The kind of data that, in the past, you might file into a tape archive stored in a basement somewhere, "just in case" it was ever needed.

  5. "Job" control? by mjackson14609 · · Score: 3, Funny

    Do you have to submit a properly-formatted JCL card to get your data back?

    --
    I decided that behaving ethically was the most nihilistic thing I could do. - Paul Pavel
  6. Everybody in the cloud! by marcosdumay · · Score: 3, Funny

    Where should I put sensitive documents that must be safely stored for a long time? In the cloud, of course!

    1. Re:Everybody in the cloud! by Kjella · · Score: 3, Insightful

      Where should I put sensitive documents that must be safely stored for a long time? In the cloud, of course!

      Yeah, going to a specialized 3rd party provider for safe long term storage is insane, you'd never put anything valuable in a bank vault would you? Would I put them in any random cloud? Not any more than I'd store my valuables in a shed, but with the right agreements in place on redundancy, backups, access control procedures and so on... maybe. Perhaps I'd use two and have redundant providers too. At least a company you have to remember that either way it's going to be run by people, whether you outsource it or not there could be bad apples. Maybe you think you can smell a bad one better among your own employees than they can, but most lack good self-assessment skills.

      --
      Live today, because you never know what tomorrow brings
  7. Re:Welcome to teh FailBoat, Amazon. by Anonymous Coward · · Score: 5, Informative

    Whenever I need to restore data from an archive backup, I need it RIGHT FUCKING NOW.

    Amazon is smoking crack.

    You seem to be confusing backups necessary for day-to-day business continuity with archival records storage typically not required for day-to-day business continuity. If the data stored on Glacier can be encrypted and the encryption/decryption keys under the control of the client and not accessible under any circumstances to Amazon, then Glacier might be a viable option for organizations. Regulatory compliance in many fields / industries could potentially rule out the use of such a service as Glacier. Although for the typical home user or student a long-term archiving service in conjunction with a service such as DropBox, Box, or even Amazon's own cloud storage and file sharing offerings makes sense for important documents but becomes cost-prohibitive for storage of music and video libraries which are better suited to other storage options anyway.

  8. So ... by PPH · · Score: 5, Funny

    ... does this mean that deleting data from Amazon Simple Storage is called an ASS-wipe?

    --
    Have gnu, will travel.
    1. Re:So ... by tgd · · Score: 5, Funny

      ... does this mean that deleting data from Amazon Simple Storage is called an ASS-wipe?

      Admit it, you've been waiting years to use that joke, haven't you?

  9. Re:Welcome to teh FailBoat, Amazon. by Chris+Mattern · · Score: 4, Informative

    In that case, it's obviously not for you.

    Some of us, however, are capable of planning ahead. I notice you said "restore from a backup." Note that this is not for backing up and restoring data you need to have available on a live basis. This is for truly *archive* data--data you don't need on a day-to-day basis but might need to retrieve in special cases. It will not, generally speaking, be a backup at all; it's your primary store of this data. Such data doesn't need to be retrieved on a moment's notice (if it was, you'd be storing it in a more expensive online store).

  10. Re:Welcome to teh FailBoat, Amazon. by retep · · Score: 3, Interesting

    > Whenever I need to restore data from an archive backup, I need it RIGHT FUCKING NOW.

    I don't. It'll be at least a few hours until FedEx arrives with the new server hardware in the best case, and a few weeks before we get a new building and our clothes stop smelling of smoke (and zombies) in the worst case.

    Interesting question though: if I submit a retrieval job, how soon do I have to actually download the associated data? Can I wait a few hours or days?

  11. Potentially a good service - needs a consumer tool by CFD339 · · Score: 4, Interesting

    I think this opens the possibility for a middle-man company to provide long term archival tools for end users. This firm would spend its energy focused on front end tools for the end user and make use of Amazon's back end long term storage for the actual infrastructure.

    There are many amateur and even professional photographers, for example, with almost no alternatives for very long term storage. Home writable media is nearly all flawed in terms of true long term storage. I'm sure there are many use cases in this space.

    In terms of mid-size and larger companies, I think a critical feature will need to be a simple interface that encrypts at the client side prior to sending the data using a private key only available on the client side. I cannot think a responsible I.T. professional would store company critical or customer data on a third party site like that without such protections in place.

    --
    The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
  12. Re:Welcome to teh FailBoat, Amazon. by tlhIngan · · Score: 4, Insightful

    Whenever I need to restore data from an archive backup, I need it RIGHT FUCKING NOW.

    I don't. It'll be at least a few hours until FedEx arrives with the new server hardware in the best case, and a few weeks before we get a new building and our clothes stop smelling of smoke (and zombies) in the worst case.

    Interesting question though: if I submit a retrieval job, how soon do I have to actually download the associated data? Can I wait a few hours or days?

    That's why people have onsite and offsite backups. If you need it right now, use the onsite backup, if it's not already available from online or nearline storage.

    But it's also good to have offline backups, in case your building gets hit with an airliner or something. In which case, having absolute immediate access to that data may not be as high a priority as executing the disaster recovery bringup plan. (If you have an offsite backup datacenter, well, why aren't you mirroring?).

    This service is for those companies who may not be big enough to afford to go tape storage (big investment), but may only have a few TB they store on backup hard drives and such. Rather than having to arrange for offsite storage, they can use Amazon to do it cheaply and effectively. I also see it as a play for Amazon as a virtual business - Amazon handling all your IT and server needs between EC2/S3/etc so a business doesn't actually have exist anywhere - employees work from home, a token post office box is the street address, etc.

    Though it is a good question - once a job is submitted and the data is ready a few hours later, how long is it available for?

  13. Glacier storage? by rossdee · · Score: 3, Funny

    Apparently someone at Amazon didn't watch the long term weather forecast - climate change means all the glaciers will be gone in a few decades.

  14. Re:Welcome to teh FailBoat, Amazon. by mdfst13 · · Score: 3, Interesting

    This could be used either way. If you are using it as an archival medium, it is less of a hassle than finding three facilities of your own (the promise is that there are at least three copies of the data at all times). To get the equivalent from tape, you'd have to buy three tapes. Plus, you need places to store them.

    If you are using it as the offsite part of your backup procedure, then it only needs to match the latency of other offsite backups. If you are restoring from a tape that you have stored in a safe deposit box, that also takes three to five hours to restore (it takes time to get to the bank and retrieve the tape, then it takes more time to read from the tape). And truly, that time will rarely matter. If you really lost

    1. Your primary data store.
    2. Your backup data store.
    3. Your local archive copy.

    all at the same time, you likely lost your physical hardware as well. Or you are experiencing a security problem that you need to fix before restoring from backup. You could promote your archived data from Glacier to S3 while you were replacing that hardware or fixing your security.

    It also may be worth thinking about how this works if you are doing everything AWS. In that case, Multi-AZ RDS provides your primary and backup data stores. It also provides the ability to rebuild your data store from real-time backups. Next, you use snapshots to take regular backups (the equivalent of a local archive copy). Weekly makes sense as RDS can store up to eight days of real-time backups. You keep a few of the most recent snapshots, but you archive most that are older than a month to Glacier. You can still keep the one month, three month, and six month snapshots in the quicker, more expensive storage.

    Now, you face a major data problem. Amazon loses two facilities. These happen to be the two facilities with your RDS stores. However, you still have the snapshots (which are stored in more than two facilities). You restore quickly. You only need to go to Glacier if you have data corruption that you don't notice for a month (so that the archive copy that you need has dropped out of the snapshots).

    If you are not using AWS for everything, then you are responsible for creating your own primary and backup data stores as well as local archive copies. Other than that, the same issues apply.

  15. damn whippersnappers by Medievalist · · Score: 4, Funny

    No one capable of participating in an online forum is old enough to actually remember/witness Burma Shave ads.

    Wrong. Now GTFO my lawn.