Slashdot Mirror


Mega-Uploads: The Cloud's Unspoken Hurdle

First time accepted submitter n7ytd writes "The Register has a piece today about overcoming one of the biggest challenges to migrating to cloud-based storage: how to get all that data onto the service provider's disks. With all of the enterprisey interweb solutions available, the oldest answer is still the right one: ship them your disks. Remember: 'Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.'"

31 of 134 comments (clear)

  1. Pro photography is a huge problem by Anonymous Coward · · Score: 5, Interesting

    Returning from a site with a tethered computer full of 80 MP 16-bit raw files from a day's worth of shooting would break most bandwidth bills if you tried uploading all these images.

    1. Re:Pro photography is a huge problem by HapSlappy_2222 · · Score: 4, Interesting

      Just because you take pro pictures at 80MP doesn't mean your business has an extra $10,000 per year laying around for a business grade gigabit pipe; as the sole employee of my company, that'd mean I'm paying myself $20,000 a year instead of $30,000 (TBH, I'm lucky to be in the black, period, with only two years in). I store my images at my studio, back them up daily to a removable disk, and bring in a 3rd removable to copy them over once per week. All told? $500 for the 2 drives and a striped array on my studio PC. The self-storage backup technique works well for me.

      Realistically, though, if I want to I can just upload them all to home or a cloud storage in batches overnight, the same way I download 10 gigabyte files at home. It's just plain easier to cart em around, though.

    2. Re:Pro photography is a huge problem by rthille · · Score: 4, Informative

      The tiny town of Sebastopol CA, population ~7800 has gigabit fiber to the (some) doorstep for $69/month.

      --
      Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
    3. Re:Pro photography is a huge problem by rthille · · Score: 2

      Um, no. Sonic.net is the business doing the rollout. Basically, they pay for it by getting 100% adoption (by eating the other services' lunches).

      --
      Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
  2. why dodge this question? by Eponymous+Hero · · Score: 2

    Pressed if disks are accepted, the company responded that “All common database products provide a capability to extract to a common file format like .csv.”

    what a professional answer. and by that i mean it didn't answer the question at all.

    --
    insensitive clod overlords obligatory xkcd car analogy russian reversals whoosh pedant fanbois ftfy in 3...2...1..PROFIT
    1. Re:why dodge this question? by Bogtha · · Score: 3, Insightful

      You don't know the exact dialogue between the journalist and the rep. I've been quoted in print in similarly stupid ways when what I said made absolute sense in context to what was asked. "Pressed if disks are accepted" could have been something like the rep telling them about a new CSV import tool they had built, the journalist saying "So if I mailed you a 5TB database on a disk, could you import that?", and the rep replying "Sure, but you'd need to export the data first...".

      --
      Bogtha Bogtha Bogtha
  3. Station Wagon Full of Tapes by viking099 · · Score: 5, Funny

    Remember: 'Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.'"

    Yeah, the bandwidth is great, but the latency SUCKS.

    1. Re:Station Wagon Full of Tapes by ffejie · · Score: 2

      I love this thinking. There was a thread about this some time back that I found most enjoyable, despite my shoddy math.

      A dumptruck full of harddrives.

      --
      Disagreeing with me does not mean you get to mod me troll.
    2. Re:Station Wagon Full of Tapes by Archangel+Michael · · Score: 2

      Or at least a JATO unit

      --
      Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
  4. Second biggest challenge by WOOFYGOOFY · · Score: 4, Insightful

    Second biggest challenge: trusting any of these places have the motivation to keep your data more secure than credit card companies do.

    1. Re:Second biggest challenge by gestalt_n_pepper · · Score: 5, Insightful

      No, that's third. The second biggest challenge is believing that those fine hosting companies with servers hosted in lower Slobbovia won't have a few entrepreneurial employees who will *actively* be searching your data for all that is monetizable.

      --
      Please do not read this sig. Thank you.
    2. Re:Second biggest challenge by betterunixthanunix · · Score: 2

      I thought the second biggest challenge is ensuring that the Empire does not raid the hosting company and render all your files inaccessible...

      --
      Palm trees and 8
  5. Re:The real hurdle by icebike · · Score: 4, Insightful

    Getting around all the buzzwords

    Well that's one hurdle.
    The next is RECOVERY when ICE or FBI or some other 3letter agency walks in an takes your data because one tiny customer use the service for some allegedly nefarious purpose.

    The key here is to use a service so big that even god himself would not dare take it down, although the Ayatollah might try. Small cloud services, even if multi-homed are a risky proposition. Even if you do manage to get all your data into them, they are not large enough to push back against any subpoena or search warrant that any misguided judge in some backwater jurisdiction may issue.

    --
    Sig Battery depleted. Reverting to safe mode.
  6. Comment removed by account_deleted · · Score: 3, Funny

    Comment removed based on user account deletion

  7. Backups by SJHillman · · Score: 5, Informative

    My last employer offered offsite backups to clients. For the initial seed, we always tried to get them to put it on an external HDD and ship it to us (or at least DVDs). The only major exceptions were clients that were also on FiOS - that was the only case where over-the-net transfer was faster than the backup-and-ship-it method for the initial seed.

  8. I'm not so sure... by fuzzyfuzzyfungus · · Score: 2

    I don't think that TFS's answer is necessarily the correct one. I'd really prefer to hold Ma Bell's feet to the fire concerning the fact that bandwidth(even in 'optimal' build-out areas, spare me the excuses about the boonies) has enjoyed a deeply underwhelming track record in terms of improvements in cost and quantity compared to most other aspects of contemporary computing.

  9. Bandwidth of a Station Wagon by Surazal · · Score: 5, Funny

    Yes, never underestimate the bandwidth of a station wagon full of disks hurtling down the highway. The latency, on the other hand, leaves much to be desired, and I've heard the packet loss can be downright fatal.

    --
    --- Journals are boring; Go to my web page instead
    1. Re:Bandwidth of a station wagon by BradleyUffner · · Score: 3, Informative

      I have never liked the station wagon analogy, because it misunderstands the thing we are trying to measure. In the example, we measure the bandwidth of the station wagon. But that's like measuring the bandwidth of a packet -- a nonsense concept. We measure the bandwidth of the channel, not the chunks of data which fly through it. To really get the right analogy, we should talk about the bandwidth of a freeway, not the station wagon which drives upon the freeway.

      Bandwidth in the colloquial sense means "the amount of data which passes a given point, per second." So, imagine that you can load 25 TB in the form of tapes into a station wagon. For safety, these station wagons must drive a distance of 75 meters apart and a speed of 100 kilometers per hour. That means that one station wagon passes a given point every 2.7 seconds. That's 9.2 TB per second. Adding a second lane to the highway would double the bandwidth.

      The stupid calculation which is often performed, on the other hand goes like this. You have 25 TB in the wagon, and you drive it to a location 10 hours away... Already you've gone off the tracks, because you are mentioning the TIME it takes to get to the destination, i.e. the LATENCY. And as anybody knows, the latency (or equivalently the distance between the points) has NOTHING to do with bandwidth.

      How can you say Time has nothing to do with bandwidth when, in your own example, you measured it in TB per SECOND?

      Following your example again of 9.2TB/sec, that can be changed to 9.2TB * 60 /min, or 9.2TB * 60 * 60 /hour, or 9.2TB * 60 * 60 * 10 / 10 hours, which is the exact measurement that you seem to have a problem with earlier in your post (data in a 10 hour period).

    2. Re:Bandwidth of a station wagon by Firethorn · · Score: 4, Interesting

      Indeed. He also ignored the core reason for having said bandwidth - you have X amount of data to move in Y time (at under Z cost); what's the best way to do so?

      As such, a 'packet' on the freeway system is rather expensive, so you don't want to be putting multiple station wagons on the system if you don't have to. Figure the driver costs $20/hour, the vehicle itself $.50/mile(gas, maintenance, insurance, tolls, etc...), and you're looking at 300 miles in 10 hours. For a single packet you're looking at $350 for that single 'packet'. If a single station wagon doesn't do it, perhaps a cargo van would, which doubles the capacity of the packet while only raising the cost $50, to $400. Still not good enough? Upgrade to a 'package van' like UPS/Fedex trucks. Next step would be a Semi.

      In any case, I'd say that you could fit 25TB into a motorcycle today - 3 TB drives are fairly common now, and I can fit 10 into my saddlebags easily. Heck, I can get 1.5TB native tapes, about the same size as a HD. Padding it's dimensions up, it's 11 x 11 x 3 cm = 363 cm^3, or 2,755 per cubic meter.

      A 2008-11 Dodge Grand Caravan Cargo van - 143.8 cubic feet = 4.07 cubic meters, giving me room for 11k 1.5TB tapes. 16.5k TB, in 10 hours, if I have a single cargo van. Ouch. Disregarding media cost, that's ~$400.

      Do this daily, we're looking at 1.5 terrabits per second. Don't know of any connections that fast.
      Monthly, we're down to ~50 gigabit (rounding down). I can guarantee that a 50 gigabit connection will cost more than $400.
      Annually, it's 'only' 4 gigabit, and I pay more than $100/month for my megabit class connection, which ISN'T utilized 100%, unlike my calc.

      You don't normally need to figure out the bandwidth of the freeway because:
      1. Generally 1 vehicle 'packet' is sufficient, and due to the high marginal cost per said vehicle, you normally only want to send one.
      2. The roads are used for more than data shipment, which would be like trying to figure out how much bandwidth you have available for VOIP by looking at total circuit bandwidth.

      Don't need to ship that much? You should be able to ship about 30 of them for $60, second day air. That's 45TB, or about 140 Mbit of 100% saturated traffic for a month. BTW, during my calcs for paying fedex to ship them, I think that weight might actually be enough of an issue to increase gasoline consumption - but I think I've established that even $800 would be cheap if you need to ship that ridiculous of an amount of data.

      --
      I don't read AC A human right
  10. Exit strategy by scsirob · · Score: 3, Insightful

    No, the second biggest challenge is to come up with a viable exit strategy. Once you have several TB at this service provider, how will you move it out of there when the next provider has a better deal? That was one of the major big points for having a cloud in the first place, to have the freedom to move your compute requirements to a better, cheaper, faster (pick two) provider.

    Even if you moved it in with a station wagon full of tapes or disks and your provider let you import it, I'm sure your provider will not be so helpful when you need to move it back out.

    Blatant plug: Perhaps Actifio (www.actifio.com) can fix this for you, by replicating your data in, and also back out of production systems in deduped and compressed format.

    --
    To Terminate, or not to Terminate, that's the question - SCSIROB
  11. A problem bigger than getting your data on ... by petes_PoV · · Score: 4, Insightful

    ... is getting it all back OFF again when you want to switch service providers.

    The one thing you want never to happen is that you get locked in to a single cloud service. They might go bust, they might become uncompetitive. They may become politically "unfriendly" or tainted with customers you have no desire to be associated with - or any of a number of other reasons to say "adios".

    Just like with disaster planning, all the processes and procedures, agreements and SLAs are worthless until you've actually PERFORMED the operation and done so without a major service interruption. How many cloud users have gone that far - and how many are locked in but don't know it?

    --
    politicians are like babies' nappies: they should both be changed regularly and for the same reasons
    1. Re:A problem bigger than getting your data on ... by jtownatpunk.net · · Score: 2

      We've already seen the unsinkable cloud get sunk. Amazon's never-down cloud has rained out at least once in some regions. Another problem is the cloud provider making changes to their services that impact the way your company operates. My last company was in the process of Googleizing when I left a year ago and Google's already made changes that must have required training and documentation updates. And they can remove apps and services at any time so some unpopular product that happens to be very useful to your company could disappear because it's not profitable for the hosting company to continue to support it. At best, you might be able to keep the service until your contract renewal date.

      Then what do you do? Take everything to another cloud? Good luck with that.

  12. Aspera and Friends by PhillC · · Score: 3, Interesting

    You you always use a UDP solution such as Aspera. Fast transfer speeds, bandwidth management and they have a specific AWS implimentation.

    Other options to look at include Smartjog, whose new Bolt product looks quite interesting, Riverbed's Steelhead product, Filecatalyst and Signiant.

    There are many solutions around now to deal with large file transfers for both small and large business. Most of them use UDP instead of TCP/IP, with Checksums to ensure all data is reliable delivered. Even with just 1Mbps upload speeds, something like one of the above named products will be advantageous. I've worked in the media industry for a number of years, and this type of thing is being used in Film and Television all the time. Of course, there are still tapes being shipped around, but in emerging markets, such as Russia for instance, the file transfer really beats a tape being stuck in customs for weeks or months.

    --
    Brought to you by the author of such childrens' classics as "Some Kittens can Fly!" and "All Dogs go to Hell."
  13. But then... by TemperedAlchemist · · Score: 4, Insightful

    How did you manage to fix armed FBI storming your servers located in another country problem?

  14. Not just a "public" challenge by dave562 · · Score: 2

    I am dealing with this as well, albeit on a different scale. About a year ago, the powers that be decided that they were going to develop a private cloud for the company. Nobody really considered how to migrate 500+TB of data from three separate sites into the new cloud. We are doing a mixture of over the wire replication (for sites with 100Mb+ of bandwidth), physical replication (using NAS devices and tape), and synchronization using DoubleTake for the SQL data and Vice Versa Pro for file system data. It is a massive undertaking, made even more difficult by the fact that we are working with production systems with locked in SLAs that need be maintained.

    For the average person, and even most enterprises, I honestly believe the best way to get into "the cloud" is by following a well planned out, phased approach. The first phase should be using the cloud as a DR target. Only when both sides of the equation are balanced and able to operate independently of each other can you consider doing away with one and moving to the other.

    1. Re:Not just a "public" challenge by ediron2 · · Score: 2

      Actually, enterprise issues regarding data synchronization quickly make get problematic.

      Have just watched a migration from private mail servers to cloud-based email. Months in, it quickly became apparent that a few short days or even a few weeks of pain associated with migrating users cold turkey (and then importing requested data from Notes once it had become static) would have been astronomically less cost and pain compared to wiring the connector and having two frameworks alive (and borking the sync in weird mediocre ways) simultaneously for months. Keep in mind, 'just migrating email' ends up meaning email plus antispam plus accounts plus calendars plus messages plus archives plus attachment storage plus service accounts (and changing hardcoded email addresses in code) plus security plus distro lists... etc, etc etc.

      It can be straightforward (but not trivial) to export static data from SQL or another known framework and migrate it into similar frameworks. And it's not much harder to translate (like SQL into non-relational (NoSQL or similar) frameworks): Understand the data, design to the new constraints and advantages, plan migration, do a trial of the migration, then cycle thru again with more or all (depending on how well the first migration went).

      Doing so on data that remains alive and breathing gets damned hard fast. Even if there are migration tools, are they ok with transactional data changes? If one side of the migration fails, do both frameworks get messages to cancel the transactions? What about record deletion? How is that synchronized? Given that the new cloud vendor may have few customers (insert tiny #) customers, does a well-tested connector exist between your old and new data stores? What about all in-house apps using the data? Are there mobile apps or custom code or ERP connections to your data server? What about data structure changes between the two: nothing maps perfectly.

      Ugly. Just Ugly.

      And that's a planned migration. Now think of going the other way. As TFA hints at, we all can write the headline and article now for going the other direction the day some cloud provider implodes: Company Widgetcorp declared bankruptcy today. Their tragic fall from stalwart Rusell-2000 midcap manufacturer to receivership happened unexpectedly: Their cloud-based XXX provider shut off servers without warning less than 60 days ago, and Widgetcorp was never able to recover critical processes and data.

  15. Re:The real hurdle by CAIMLAS · · Score: 5, Informative

    That is just one of many of the hurdles.

    Really, these problems are problems because most 'cloud' shit is done wrong.

    It's a bit of a worn out record here on Slashdot, but anyone or any company which is fully dependent upon The Cloud for business continuity is a fool.

    * First off, there is no such thing as 'utility computing', and probably never will be due to the volatile nature of storage and its ongoing cost of maintenance.
    * Second, if you do not maintain primary physical control of something, to the best of your ability, you do not control it.
    * For primary IT infrastructure, it will cost more to do "Cloud" than local. If you can afford 2-3 servers a year, but not much more, and a nominal IT operations budget, chances are you should have an in-house "cloud" with off-site replication.
    * Bandwidth costs both ways will kill you, as will latency in many cases, will kill Cloud functionality.

    At this point, I still strongly recommend against public Clouding your systems unless they are:

    a) (very!) low volume with use-based billing. This only makes sense for a low-volume public-facing site where you don't already have IT infrastructure (on a cost basis)
    b) off-site 'hot' replication. You've got your inside 'private Cloud' which replicates to off-site systems. (Cloud is basically just colocated virtualization, after all.)
    c) Other geographic/distribution requirements (eg. multisite organization with none serving as a good central hub). In this case, colocation of your own equipment makes more sense in many regards.

    --
    ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
  16. Canada by DarwinSurvivor · · Score: 3, Interesting

    In canada, unless you need low latency, the internet is about the most expensive method you could possibly use to transfer data. source

  17. AWS Import/Export service... just ship the disks. by isaac · · Score: 2

    http://aws.amazon.com/importexport/

    http://awsimportexport.s3.amazonaws.com/aws-import-export-calculator.html

    It's not rocket science. Yes, shipping drives is the cheapest, fastest option for a lot of people.

    YMMV, speaking for myself, not my employer, etc. etc.

    -Isaac

    --
    I am not a lawyer, and this is not legal advice. For Entertainment Purposes Only.
  18. Re:The bandwidth of a fully laden alimentary canal by FunkSoulBrother · · Score: 3, Funny

    African or European?

  19. MegaUploads: The other unspoken hurdle by Arancaytar · · Score: 2

    Namely, the increased risk that your data will become collateral damage in the War On Piracy.