Amazon Wants To Replace Tape With Slow But Cheap Off-Site "Glacier" Storage
Nerval's Lobster writes with a piece at SlashCloud that says "Amazon is expanding its reach into the low-cost, high-durability archival storage market with the newly announced Glacier. While Glacier allows companies to transfer their data-archiving duties to the cloud — a potentially money-saving boon for many a budget-squeezed organization—the service comes with some caveats. Its cost structure and slow speed of data retrieval make it best suited for data that needs to be accessed infrequently, such as years-old legal records and research data. If that sounds quite a bit like Amazon Simple Storage Service, otherwise known as Amazon S3, you'd be correct. Both Amazon S3 and Glacier have been designed to store and retrieve data from anywhere with a Web connection. However, Amazon S3 — 'designed to make Web-scale computing easier for developers,' according to the company — is meant for rapid data retrieval; contrast that with a Glacier data-retrieval request (referred to as a 'job'), where it can take between 3 and 5 hours before it's ready for downloading."
my company pays for offsite storage of our tapes and i did some quick math
$2000 a month to store over 1000 tapes for us. I think the minimum bill is like $1500 if you only have a few tapes
$.01/GB is $10 to $20 per LTO-4 tape per month. i know the specs are less but ive seen LTO-4 tapes hold close to 4GB of data.
i send out one tape per month for storage and keep a bunch more locally. so even on the cheap end that's $240 per month for the first year.
Walkabout the glacier
With stubble on the face. You're
Returning to a place sure
To need a smoother face, pure.
Burma Shave
Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
what about 5 year old billing records for a customer/partner inquiry or lawsuit. i've had to compile those and a 2 week wait was OK in almost every case
Based on the waiting times, it sounds almost like they have some sort of robotic tape loading system, and you're basically just offloading your tape storage from the office to the nebulous cloud.
I believe this is intended for archival data that is unlikely to be needed, especially not in full, not operational data that you might need to do a full restore from. The kind of data that, in the past, you might file into a tape archive stored in a basement somewhere, "just in case" it was ever needed.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Do you have to submit a properly-formatted JCL card to get your data back?
I decided that behaving ethically was the most nihilistic thing I could do. - Paul Pavel
Where should I put sensitive documents that must be safely stored for a long time? In the cloud, of course!
Rethinking email
Whenever I need to restore data from an archive backup, I need it RIGHT FUCKING NOW.
Amazon is smoking crack.
You seem to be confusing backups necessary for day-to-day business continuity with archival records storage typically not required for day-to-day business continuity. If the data stored on Glacier can be encrypted and the encryption/decryption keys under the control of the client and not accessible under any circumstances to Amazon, then Glacier might be a viable option for organizations. Regulatory compliance in many fields / industries could potentially rule out the use of such a service as Glacier. Although for the typical home user or student a long-term archiving service in conjunction with a service such as DropBox, Box, or even Amazon's own cloud storage and file sharing offerings makes sense for important documents but becomes cost-prohibitive for storage of music and video libraries which are better suited to other storage options anyway.
In this case, "web" is a synonym for "internet". The context made it very clear.
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Have gnu, will travel.
I looked at doing my own backups using S3... it's about $25 a month for 200 Gbytes. Gee, that'd be $2 a month (!) in Glacier... but for backups that I just want securely off-site, a five hour wait to get it to S3 is fine.
In that case, it's obviously not for you.
Some of us, however, are capable of planning ahead. I notice you said "restore from a backup." Note that this is not for backing up and restoring data you need to have available on a live basis. This is for truly *archive* data--data you don't need on a day-to-day basis but might need to retrieve in special cases. It will not, generally speaking, be a backup at all; it's your primary store of this data. Such data doesn't need to be retrieved on a moment's notice (if it was, you'd be storing it in a more expensive online store).
It's definitely a big cost savings compared to Amazon S3 (i.e. roughly 90% less expensive). For backups that one doesn't need to access in a time-critical manner, it seems like an excellent alternative to S3 (e.g. videos, photos, etc.).
If transferring the gigabytes of data nightly over the internet was feasible, we'd be using rsync to an offsite server for a fraction of the cost. Bandwidth / sync time is the issue here, not whether or not its on tape or not. Why would I use Amazon if I can just run rsync to my remote server for (probably) a much lower cost. We use tape because there is not enough time to run these backups over the web. Maybe as some kind of secondary backup solution so Joe doesn't have to go get the tapes, but it probably wouldn't be a nightly solution. At least not for us.
neorush
Back when data may be on numerous tape or magneto optical disks. Glad Amazon has reinvented the 90's.
I look forward to see what services are built on top of this. Easy and cheap backup?
A robotic tape system would generally give you your data back in a few minutes at most, but Amazon is saying you can expect multiple hours of waiting. I'm assuming this system is literally based on people moving around boxes of tapes and inserting them into tape readers; inconvenient but reassuring in its own way. Perhaps they've managed to automate things even further, say by setting up carts of hundreds of tapes carried around by a forklift that get plugged into the robotic tape loading system.
Also sound like an interesting operations challenge though in trying to co-ordinate all the read request jobs when your customers can store as little as 1 byte. You can see why they penalize any attempt to actually read your data, especially if you send in a read request job within a short time period of storing the data.
It usually takes us a couple days to put in the request, get the tapes from offsite, then restore the data, hoping we picked the right dates.
"Note that this is not for backing up and restoring data you need to have available on a live basis. This is for truly *archive* data--data you don't need on a day-to-day basis but might need to retrieve in special cases. It will not, generally speaking, be a backup at all; it's your primary store of this data."
But doesn't that seem like an inherent problem? I can see outsourced, online storage as one redundant element in a backup system; but trusting it as a primary store of data, not so much.
We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes
> Whenever I need to restore data from an archive backup, I need it RIGHT FUCKING NOW.
I don't. It'll be at least a few hours until FedEx arrives with the new server hardware in the best case, and a few weeks before we get a new building and our clothes stop smelling of smoke (and zombies) in the worst case.
Interesting question though: if I submit a retrieval job, how soon do I have to actually download the associated data? Can I wait a few hours or days?
Connections over port 80?
What they probably mean is they provide a web interface for interacting with the system rather then providing a locally installed application, then downloaded strait through the browser.
will system meltdowns on Glacier be referred to as Jökulhlaup?
Freeze Ray. Tell your friends.
This sounds amazingly like someone put money into a data storage system that turned out to be far slower than they'd wanted. Now marketing is picking up the slack by calling it Glacier.
In other words, they're stuck trying to sell white salmon by claiming "Guaranteed to never turn pink in the can!"
Everything is better with chainsaws.
I think this opens the possibility for a middle-man company to provide long term archival tools for end users. This firm would spend its energy focused on front end tools for the end user and make use of Amazon's back end long term storage for the actual infrastructure.
There are many amateur and even professional photographers, for example, with almost no alternatives for very long term storage. Home writable media is nearly all flawed in terms of true long term storage. I'm sure there are many use cases in this space.
In terms of mid-size and larger companies, I think a critical feature will need to be a simple interface that encrypts at the client side prior to sending the data using a private key only available on the client side. I cannot think a responsible I.T. professional would store company critical or customer data on a third party site like that without such protections in place.
The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
That's why people have onsite and offsite backups. If you need it right now, use the onsite backup, if it's not already available from online or nearline storage.
But it's also good to have offline backups, in case your building gets hit with an airliner or something. In which case, having absolute immediate access to that data may not be as high a priority as executing the disaster recovery bringup plan. (If you have an offsite backup datacenter, well, why aren't you mirroring?).
This service is for those companies who may not be big enough to afford to go tape storage (big investment), but may only have a few TB they store on backup hard drives and such. Rather than having to arrange for offsite storage, they can use Amazon to do it cheaply and effectively. I also see it as a play for Amazon as a virtual business - Amazon handling all your IT and server needs between EC2/S3/etc so a business doesn't actually have exist anywhere - employees work from home, a token post office box is the street address, etc.
Though it is a good question - once a job is submitted and the data is ready a few hours later, how long is it available for?
Apparently someone at Amazon didn't watch the long term weather forecast - climate change means all the glaciers will be gone in a few decades.
What's really amazing and [un]special about you, is that you are The One case! You are the same as everyone, so no one needs things that you don't need, everyone has the same constraints (and lack of constraints) that you do, and your desires represent the desires of humanity.
Congratulations on being 100% of the market.
I have been looking for you, though previously dismissed you as mythical. So tell me: what is the next great product that everyone wants? You, of all people, know the answer to this.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
This service is for those companies who may not be big enough to afford to go tape storage (big investment), but may only have a few TB they store on backup hard drives and such. Rather than having to arrange for offsite storage, they can use Amazon to do it cheaply and effectively. I also see it as a play for Amazon as a virtual business - Amazon handling all your IT and server needs between EC2/S3/etc so a business doesn't actually have exist anywhere - employees work from home, a token post office box is the street address, etc.
I suspect the latter is going to be pretty common. If you're running something fully cloud hosted like imgur or reddit existing Amazon services were pretty expensive for your long-term backups; a lot of wasted money on retrieval speed that you didn't need. This finally gives the last piece of the storage puzzle: long-term cheap backups and archiving. Previously your best bet was to either download the data yourself, or use their physical drive service where you ship media to them and have them load up the data for you.
Honestly, at this point what service doesn't Amazon offer when it comes to your computing setup? (modulo the more general objections to cloud computing of course)
Myself I'll probably start using them for my home computer backups. 500GB * $0.01 is just $5 a month. I'm really looking forward to seeing rdiff-backup-like tools with proper delta support.
I think this opens the possibility for a middle-man company to provide [...] tools for end users.
You hit the nail on the head about AWS' goal: They are providing the APIs for others to develop consumer-level tools and products by utilizing their existing infrastructure. Everything, from EC2 to S3 to R53, is geared towards developers (which will then market to end users) by providing full functionality via an API. Glacier is no exception, and as you said, there will be great tools available for end users for those ready to create them.
Maybe someone reading this thread is already fast at work developing exactly what you say.
Whenever I need to restore data from an archive backup, I need it RIGHT FUCKING NOW.
Then use Amazon S3. Reading the article (or even summary, in this case) has not yet been linked to cancer, so give it a try.
What if Amazon's assets are ever frozen? Wouldn't that freeze all its customer data as well, including your only copy of some data you placed on Glacier Storage? (heh, frozen assets, Glacier)
This is essentially what Amazon (and Google mail/docs for that matter) is doing - Aiming to become your company's new IT department. No CEO in their right mind is going to pay multiple salaries/benefits for a staffed IT department when they can get it from Google and Amazon way cheaper. Even if they pay $10k/month, that's cheaper than paying to staff a 4 person IT departement.
And before you start in about how this helps small startups who can't afford and IT staff, well think again. They can't afford the cloud services either or they wouldn't have the development team running the website/DNS/etc.
Join the Slashcott! Feb 10 thru Feb 17!
They obviously could use some help.
It's $10/month per 1TB which imho is pretty fair. Maybe not doable if you have 1,000 1TB tapes like someone else posted but for most other businesses that's not bad.
If you wanna get rich, you know that payback is a bitch
Interesting question though: if I submit a retrieval job, how soon do I have to actually download the associated data? Can I wait a few hours or days?
According to the AWS Blog, 24 hours:
Or is this going to be using tape to store data?
I've been waiting for something like this in my case. As a startup, it lets us get rid of all the servers we keep in the corner because we may "one day need that data on those old hard drives". This was the promise that nimbus.io gave us, but they are about 12 months behind and a dollar short. Sure, this is much slower retrieval, but the likelihood of us ever requesting a retrieval is quite minimal. It's at a cost of $10/TB, and I'm sure we pay more than that now in storage costs.
This could be used either way. If you are using it as an archival medium, it is less of a hassle than finding three facilities of your own (the promise is that there are at least three copies of the data at all times). To get the equivalent from tape, you'd have to buy three tapes. Plus, you need places to store them.
If you are using it as the offsite part of your backup procedure, then it only needs to match the latency of other offsite backups. If you are restoring from a tape that you have stored in a safe deposit box, that also takes three to five hours to restore (it takes time to get to the bank and retrieve the tape, then it takes more time to read from the tape). And truly, that time will rarely matter. If you really lost
1. Your primary data store.
2. Your backup data store.
3. Your local archive copy.
all at the same time, you likely lost your physical hardware as well. Or you are experiencing a security problem that you need to fix before restoring from backup. You could promote your archived data from Glacier to S3 while you were replacing that hardware or fixing your security.
It also may be worth thinking about how this works if you are doing everything AWS. In that case, Multi-AZ RDS provides your primary and backup data stores. It also provides the ability to rebuild your data store from real-time backups. Next, you use snapshots to take regular backups (the equivalent of a local archive copy). Weekly makes sense as RDS can store up to eight days of real-time backups. You keep a few of the most recent snapshots, but you archive most that are older than a month to Glacier. You can still keep the one month, three month, and six month snapshots in the quicker, more expensive storage.
Now, you face a major data problem. Amazon loses two facilities. These happen to be the two facilities with your RDS stores. However, you still have the snapshots (which are stored in more than two facilities). You restore quickly. You only need to go to Glacier if you have data corruption that you don't notice for a month (so that the archive copy that you need has dropped out of the snapshots).
If you are not using AWS for everything, then you are responsible for creating your own primary and backup data stores as well as local archive copies. Other than that, the same issues apply.
depends what you need the restore for.
sometimes it's an added benefit that the backups aren't on any network where they can be wiped from the network, for obvious reasons.
world was created 5 seconds before this post as it is.
This service is for those companies who may not be big enough to afford to go tape storage (big investment)
This whole discussion brings up a (tangential) question - at what scale does tape backup make economic sense nowadays? We're small - an educational unit - and were running tape backups for years. But as our data store grew (we're at ~ 10TB now), and disk hardware became cheaper and smaller, as time has progressed we've found it less expensive to move to disk for both our immediate and offsite backups.
Or is it simply a space argument - you can take tapes offsite without needing a second set of hardware?
#DeleteChrome
Where are all the good end-user tools for S3 now?
You can find one or two, but it's curious that a Google search for "Amazon S3 client comparison" turns up links from 2009 and 2010.
More curious is the fact that Dropbox, SugarSync, the MS solution, Google's new solution etc seem to be thriving and providing exactly the kind of services that you'd expect third party S3 clients to provide.
I'm not saying these clients don't exist, but I don't seem to find them very easily compared to other cloud storage options, and you'd kind of expect people to come up with lots of crazy storage solutions.
Considering that glaciers are melting at an alarming rate around the world and losing their 'data' I think 'glacier' might not be the best term for this product. It doesn't really inspire confidence.
Whenever I need to restore data from an archive backup, I need it RIGHT FUCKING NOW.
Amazon is smoking crack.
When I need to restore data RIGHT FUCKING NOW, I restore it from a snapshot on the storage array. Glacier storage would be for when my storage array has gone up in flames and since it'll take me a week(s) to buy a new array and find somewhere to keep it, waiting a few hours for a restore job to be available is ok with me, especially since it'll take 2 weeks to restore the data to my array over my 1gbit internet connection.
I work for a government organization with law enforcement divisions. When I get an order to restore from backups (and it is an _order_ not a request). I have a 1 hour window to complete the restore from tape if the cartridge is still onsite, and a two hours deadline if I have to send a staff member to fetch the tape from one of our three offsite storage locations. Our electronic data is as old as from the 1980's, anything older than that is on microfiche, but that's in the process of being digitized into a new document imaging system. We use a barcode inventory/library system to keep track of the cartridges and what's stored on them.
The examples all use the Retrieval pricing:
http://aws.amazon.com/glacier/faqs/
Not having ever used AWS, I'm wondering what is the difference between a "Transfer Out" and a "Retrieval"?
Okay, I thought Google Play was a terrible name, but Amazon Glacier leaves me speechless.
Kriston
But it's in 'The Cloud'. How can 'The Cloud' be frozen?
for faster access, you know. http://downloadinternet.funnypart.com/
if this is supposed to be a new economy, how come they still want my old fashioned money?
Wrong. Now GTFO my lawn.
An internet connection, yes. But a web connection? :)
Well I would have got first post if I wasn't using Amazon Glacier for my swap file.
Does anyone here regularly deal with actually retrieving data stored long term on tape? In theory, it seems sound. I don't do it regularly, but the few times I have had to request a retrieval of old data from tape it's been a complete waste of time. Lots of excuses. No data.
If the data stored on Glacier can be encrypted and the encryption/decryption keys under the control of the client and not accessible under any circumstances to Amazon, then Glacier might be a viable option for organizations
This is possible with any opaque data storage - a blob is a blob, why would they care if that particular sequence of bytes represents some encrypted data or not?
Dropbox and SugarSync both are applications using Amazon S3 for infrastructure (SugarSync says they use "two carrier-grade data centers, including Amazon's S3 facility.") So you've largely answered your own question about where the end-user tools for S3 are.
Actually, as you've just demonstrated, they are quite easy to find and widely used, but the popular ones have made the use of S3 largely invisible to the end user that isn't reading the service provider's infrastructure descriptions.
By the way, one thing Amazon said is that they'll eventually offer transparent archiving of data from S3 to Glacier. That will be really interesting, since you can then back up your data to S3 for instant access, and also have a cheap historical archive of that data going back however long you can afford. Add some transparent client sync solution that works on top of S3 (e.g. Jungle Disk), and this would be a very convenient set up even for home use.
You mean "Web Scale" is a term that people ACTUALLY use? I thought that that youtube video was just exaggerating for theatrical effect.
*face palm*
This sounds like an ideal medium for PACS - medical imaging. PACS generates large quantities of data, which may be required to be retained for a very long time to be available for medico-legal reasons. For clinical purposes, 97% of the data over three years old is never referenced, but trying to get anyone to agree to an ILM policy that isn't at least 30 years is a real problem. Given the average acute hospital is generating 20TB of image data per year, this service from Amazon might be quite popular. DICOM copes very well with offline data that takes many hours to retrieve - and medico-legal requests can take days to honour, so this could be very successful. The only outstanding requirement is that for EU citizens, the archive would need to be located in the European Economic Area ...
This is possible with any opaque data storage - a blob is a blob, why would they care if that particular sequence of bytes represents some encrypted data or not?
This is correct, by the way. According to AWS's docs (I looked it up because I was curious), they automatically encrypt your data on the backend and keep the keys for you. If you want control over the keys, AWS advises you to encrypt your data before transferring it to Glacier.
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
Interesting question though: if I submit a retrieval job, how soon do I have to actually download the associated data? Can I wait a few hours or days?
24 hours, according to the FAQ.
They also support Import/Export, so you can theoretically ship them a portable hard disk, and they'll ship it back to you with all your data on it. If you have a large amount of data and need it as quickly as possible, this is probably the way to go.
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
I'm an ordinary home user who wants to backup my really important data in case of catastrophe. Besides lots of little stuff, by far my biggest data in this category is my pictures, and when all totaled up, it comes out to about 75GB.
I've been mulling investing in a service like Crashplan, which according to their pricing would cost me $5 a month if I was month to month, or about $3 a month if I committed to 4 years (!).
Amazon Glacier could offer me backups for one cent a GB per month. So for my scenario, that'd come out to 75 cents a month.
Is it just me or is this an insanely good deal for my consumer scenario?
Use DRBD Proxy:
http://www.linbit.com/products-services/drbd-proxy/
Yes, it's a shameless plug; I work for the company, but for this specific purpose it's a unique and great tool and it gives you a lot more flexibility than using a commercial provider.
I can get behind the idea of having off site stored archives for the unlikely event of on-site and stored media being destroyed. A backup to my backup. Just what I need !!