Mega-Uploads: The Cloud's Unspoken Hurdle
First time accepted submitter n7ytd writes "The Register has a piece today about overcoming one of the biggest challenges to migrating to cloud-based storage: how to get all that data onto the service provider's disks. With all of the enterprisey interweb solutions available, the oldest answer is still the right one: ship them your disks. Remember: 'Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.'"
Returning from a site with a tethered computer full of 80 MP 16-bit raw files from a day's worth of shooting would break most bandwidth bills if you tried uploading all these images.
Getting around all the buzzwords
Pressed if disks are accepted, the company responded that “All common database products provide a capability to extract to a common file format like .csv.”
what a professional answer. and by that i mean it didn't answer the question at all.
insensitive clod overlords obligatory xkcd car analogy russian reversals whoosh pedant fanbois ftfy in 3...2...1..PROFIT
Yeah, the bandwidth is great, but the latency SUCKS.
Second biggest challenge: trusting any of these places have the motivation to keep your data more secure than credit card companies do.
Comment removed based on user account deletion
My last employer offered offsite backups to clients. For the initial seed, we always tried to get them to put it on an external HDD and ship it to us (or at least DVDs). The only major exceptions were clients that were also on FiOS - that was the only case where over-the-net transfer was faster than the backup-and-ship-it method for the initial seed.
station wagon has low bandwidth, the tapes have to be written and read.
I don't think that TFS's answer is necessarily the correct one. I'd really prefer to hold Ma Bell's feet to the fire concerning the fact that bandwidth(even in 'optimal' build-out areas, spare me the excuses about the boonies) has enjoyed a deeply underwhelming track record in terms of improvements in cost and quantity compared to most other aspects of contemporary computing.
Intercontinental company I used to work for, once or twice a year they'd send an intern over the Atlantic in the SST with a case of tapes.
When it just positively had to be there asap...
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
Yes, never underestimate the bandwidth of a station wagon full of disks hurtling down the highway. The latency, on the other hand, leaves much to be desired, and I've heard the packet loss can be downright fatal.
--- Journals are boring; Go to my web page instead
You might get one hell of a download rate from torrents, but you don't get a better upload rate.
Don't know something? Look it up. Still don't know? Then ask.
I have never liked the station wagon analogy, because it misunderstands the thing we are trying to measure. In the example, we measure the bandwidth of the station wagon. But that's like measuring the bandwidth of a packet -- a nonsense concept. We measure the bandwidth of the channel, not the chunks of data which fly through it. To really get the right analogy, we should talk about the bandwidth of a freeway, not the station wagon which drives upon the freeway.
Bandwidth in the colloquial sense means "the amount of data which passes a given point, per second." So, imagine that you can load 25 TB in the form of tapes into a station wagon. For safety, these station wagons must drive a distance of 75 meters apart and a speed of 100 kilometers per hour. That means that one station wagon passes a given point every 2.7 seconds. That's 9.2 TB per second. Adding a second lane to the highway would double the bandwidth.
The stupid calculation which is often performed, on the other hand goes like this. You have 25 TB in the wagon, and you drive it to a location 10 hours away... Already you've gone off the tracks, because you are mentioning the TIME it takes to get to the destination, i.e. the LATENCY. And as anybody knows, the latency (or equivalently the distance between the points) has NOTHING to do with bandwidth.
I got my first linux distribution (I don't remember if they were called distributions back then) shipped on tape to the campus computer lab where a group of us brought our computers to copy the files.
I'll send you a growl notification when it finishes...
"Flyin' in just a sweet place,
Never been known to fail..."
No, the second biggest challenge is to come up with a viable exit strategy. Once you have several TB at this service provider, how will you move it out of there when the next provider has a better deal? That was one of the major big points for having a cloud in the first place, to have the freedom to move your compute requirements to a better, cheaper, faster (pick two) provider.
Even if you moved it in with a station wagon full of tapes or disks and your provider let you import it, I'm sure your provider will not be so helpful when you need to move it back out.
Blatant plug: Perhaps Actifio (www.actifio.com) can fix this for you, by replicating your data in, and also back out of production systems in deduped and compressed format.
To Terminate, or not to Terminate, that's the question - SCSIROB
... is getting it all back OFF again when you want to switch service providers.
The one thing you want never to happen is that you get locked in to a single cloud service. They might go bust, they might become uncompetitive. They may become politically "unfriendly" or tainted with customers you have no desire to be associated with - or any of a number of other reasons to say "adios".
Just like with disaster planning, all the processes and procedures, agreements and SLAs are worthless until you've actually PERFORMED the operation and done so without a major service interruption. How many cloud users have gone that far - and how many are locked in but don't know it?
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
> You might get one hell of a download rate from
> torrents, but you don't get a better upload rate.
You, sir, are unfamiliar with how things work here, in Soviet Russia!
"Flyin' in just a sweet place,
Never been known to fail..."
Here in the US we have a $270/month cable connection that's 10x2Mb so yeah, our daily backup uploads would take longer than 24 hours, or as they call it on the street, a day. We don't have the best compression or de-dupe but still. I can't even imagine quickbooks off the web let alone our actual databases. As far as I'm concerned, cloud stuff isn't a magical futuristic awesome technology to migrate to, it's a crappy, slow, unstable, budget solution that small business might consider. I just love the ads for "build your own local, onsite cloud solution!" or as I like to call it, "not cloud."
Also, the networking book I read estimated that the bandwidth vs cost of Fedex 1 day guaranteed overnight to anywhere US to US was a better cost compared to bandwidth solution than a station wagon on the freeway. Barracuda backup solutions agrees and uses that as the initial backup method.
I do know at least how things work on /.
Natalie_Portman_and_hot_grits.torrent
Don't know something? Look it up. Still don't know? Then ask.
I work at a cloud storage gateway company...in engineering...if that helps.
For customers in the 1-40TB range:
o 1Gbps links into the cloud vendor are available, even if only for a short initial duration.
o A good cloud storage gateway can maintain that 1Gbps.
o A good cloud storage gateway will compress, de-duplicate, and otherwise squeeze every bit of redundancy out of the data to reduce the transmitted and resting data size.
o A good cloud storage gateway encrypts the data before it goes to the cloud.
o At 1Gbps, 1TB is about 2.5 hours. So in a regular weekend, 21TB can be uploaded. If that was compressed/de-duplicated you might be talking about 30-100TB of client data sent up in a weekend.
o Many fortune 1000 companies that have 1-5 Gbps into their cloud storage.
For the larger customers that think in PB:
o Some Cloud Storage vendors have shipped storage nodes to the customer.
o Local storage nodes become the target for a cloud storage gateway.
o Write all the data at 10Gbps or more with multiple gateways.
o Ship the storage nodes back to the cloud vendor.
o Note that not all vendors will do this...but it is worth asking if you speak in PBs.
So, yeah, a sneaker net is still the answer for the large data sets, but 1Gbps isn't too shabby for most everyone else.
Guessing that [a CD distributor's] business would have shrunk quite a lot since the days when everyone was on dialup and you'd have had to be on crack to consider downloading even a CD's worth, let alone a DVD.
Shrunk? Yes, I'll grant. Still useful in places that can't get FTTH, DOCSIS, or DSL? Yes. Satellite and cellular are still capped to about one DVD a month, with single or dual layer depending on which plan you choose.
Ah yes, the TCP/USPS revolution has finally arrived!
or else!
You you always use a UDP solution such as Aspera. Fast transfer speeds, bandwidth management and they have a specific AWS implimentation.
Other options to look at include Smartjog, whose new Bolt product looks quite interesting, Riverbed's Steelhead product, Filecatalyst and Signiant.
There are many solutions around now to deal with large file transfers for both small and large business. Most of them use UDP instead of TCP/IP, with Checksums to ensure all data is reliable delivered. Even with just 1Mbps upload speeds, something like one of the above named products will be advantageous. I've worked in the media industry for a number of years, and this type of thing is being used in Film and Television all the time. Of course, there are still tapes being shipped around, but in emerging markets, such as Russia for instance, the file transfer really beats a tape being stuck in customs for weeks or months.
Brought to you by the author of such childrens' classics as "Some Kittens can Fly!" and "All Dogs go to Hell."
How did you manage to fix armed FBI storming your servers located in another country problem?
I am dealing with this as well, albeit on a different scale. About a year ago, the powers that be decided that they were going to develop a private cloud for the company. Nobody really considered how to migrate 500+TB of data from three separate sites into the new cloud. We are doing a mixture of over the wire replication (for sites with 100Mb+ of bandwidth), physical replication (using NAS devices and tape), and synchronization using DoubleTake for the SQL data and Vice Versa Pro for file system data. It is a massive undertaking, made even more difficult by the fact that we are working with production systems with locked in SLAs that need be maintained.
For the average person, and even most enterprises, I honestly believe the best way to get into "the cloud" is by following a well planned out, phased approach. The first phase should be using the cloud as a DR target. Only when both sides of the equation are balanced and able to operate independently of each other can you consider doing away with one and moving to the other.
Is around a gigabyte per second.
(100 packs of 16*64GB microSDs, in appropriate packaging, swallowed at intervals over the course of a day)
__ and expect us to like it and pay for the bandwidth.
For what amazon would charge us for 20 terabytes of clowd storage feeding 12 websites running on their clowd we built our own low energy datacenter with regional redundancy. We saved about 60 percent in annual costs so this pays for itself it 3 years.
Add a flux capacitor to the station wagons.
What is your transmission speed when the transmission is completed in -1 hour?
Fight Spammers!
Not that there isn't a right time and place for MicroSD but the above suggestion is pretty much semantically identical to Garbage In Garbage Out.
Visit CryptoGnome in his home.
The 'Cloud' option needs to be a part of your system design in the first place. So you begin to accumulate all that data in The Cloud from the word go.
Have gnu, will travel.
In canada, unless you need low latency, the internet is about the most expensive method you could possibly use to transfer data. source
http://aws.amazon.com/importexport/
http://awsimportexport.s3.amazonaws.com/aws-import-export-calculator.html
It's not rocket science. Yes, shipping drives is the cheapest, fastest option for a lot of people.
YMMV, speaking for myself, not my employer, etc. etc.
-Isaac
I am not a lawyer, and this is not legal advice. For Entertainment Purposes Only.
African or European?
Wait until your "cloud" operator goes bust or their data center burns down or something else happens that makes it impossible to get to your data.
Speaking from a disaster recovery perspective, "The Cloud" (man I hate that term) can only be either the primary location with data backup elsewhere or the backup location with the primary elsewhere, the minute companies start using the cloud exclusively they are putting their business at risk.
When "the Cloud" becomes a secure system which is automatically replicated to multiple locations around the planet and it is "owned" by an entity which cannot go bust or hold my data to ransom then I will consider it as being a good place to put my exceptionally valuable data, until then it's a no no.
Namely, the increased risk that your data will become collateral damage in the War On Piracy.
If you want to transfer a petabit of data by torrent, you're still going to need a petabits worth of data allowance from your ISP to get the data into the interwebs in the first place. Torrents aren't a magic way of bypassing the fact that data needs to be moved from one place (your computer) to another (someone else's computer). It just saves you from transferring the same data over and over again.
Just as in television where large amount of data is transferred. The cloud service provider needs to have a satellite link and a local rep can arrive at the customer location and setup the link.
I've got great download speed (~28Mbs) at home, but my upload speed is throttled down to the 0.8 range, meaning it would take a month round the clock to get all my music up, and more like 3 or 4 months to get a complete hard drive backup there. Like you, I have all these cloud accounts (Amazon, Google, Live, Dropbox, etc.) and I only use them for tiny point solutions - like sending a small number family photos or maybe one family video out. There's no way I'll be loading up my Amazon cloud player any time soon. And something like Carbonite? Not gonna happen.
Yeah, peak load for 20 days / 2 years.
Yeah, buy 20,000 HP servers with 48cores each.
Or prebuy 80,000 VMs at ec2 for 20 days.
Liberty freedom are no1, not dicks in suits.
My company just got this appliance( www.storsimple.com ) . It looks good but we are in the process of implementing. If it works, it sounds like a great solution.
So, not only do you get to pay for the privilege of less control over your own data, you get to pay for the privilege of having them send your media back too! Brilliant!
Please don't post a serious response to my humorous twaddle.
You don't want to appear like a sufferer from ASCIIbergers syndrome...
"Flyin' in just a sweet place,
Never been known to fail..."
There must be a lot of us, judging by your painful 0 (Overrated) score. Next time, more humour and less twaddle, my good man!
I understand that people want the data real time from there cloud client computers, but why don't the service providers just use apps like KGB Archiver(http://sourceforge.net/projects/kgbarchiver/) to super compress the data? ...then stream the selective files automatically???? Much easier and innovative.