Slashdot Mirror


Ask Slashdot: Cloud Service On a Budget?

First time accepted submitter MadC0der writes "We just signed a project with a very large company. We are a computer vision based company and our project gathers images from a facility from PA. Our company is located in TN. The company we're gather images from is on a very high speed fiber optic network. However, being a small company of 11 developers, and 1 systems engineer, we're on a business class 100mb cable connection which works well for us but not in this situation. The information gathered from the client in PA is s 1½mb .bmp image, along with a 3mb Depth map file, making each snapshot a little under 5 megs. This may sound small, but images are taken every 3-5 seconds. This can lead to a very large amount of data captured and transferred each day. Our facility is incapable of handling such large transfers without effecting internal network performance. We've come to the conclusion that a cloud service would be the best solution for our problem. We're now thinking the customer's workstation will sync the data with the cloud, and we can automate pulling the data during off hours so we won't encounter congestion for analysis. Can anyone help suggest a stable, fairly price cloud solution that will sync large amounts for offsite data for retrieval at our convenience (nightly Rsync script should handle this process)?

20 of 121 comments (clear)

  1. BYOS by connor4312 · · Score: 2

    Bring your own server. Depending on the time frame/duration of the project, it might be more cost effective to rent a quarter or half rack in a datacenter and build/buy your own servers. High initial up front cost, but does save money in the long run.

    1. Re:BYOS by BobC · · Score: 3, Funny

      Is the data being generated 24/7? If so, that's 432 GB/day, pretty much exactly 12 hours worth of your 100 Mbps bandwidth. So some spooling is needed, but why in the cloud? The main goal would seem to be avoiding paying twice to move the data, so you'd want to avoid using through a 3rd party if at all possible.

      1. The simplest solution would appear to be to put a laptop with a 500+ GB HD at their facility. A laptop because it essentially has a built-in UPS, and the CPU can sleep much of the time.

      2. Develop a relationship with whoever provides their bandwidth. Find the nearest peering point. Put a laptop there.

      3. Get the NSA as a client, do some sysadmin work for them. Your data will be RIGHT THERE!

    2. Re:BYOS by Chrisq · · Score: 4, Funny

      [I would use SSDs in a metal padded case knowing Fedex].

      Fedex is like UDP, an unreliable delivery service. In fact there is only one fault of UDP it does not duplicate. Things can arrive broken, out of order, delayed, or not at all but I have never heard of Fedex delivering multiple copies!

    3. Re:BYOS by Anonymous Coward · · Score: 4, Funny

      I have. Sometimes Amazon messes up. This is how I have a copy of XCOM. :)

  2. I'll be the one to say it... by atari2600a · · Score: 4, Insightful

    ...WHY are you using BMP in the first place? Does whatever you're generating these on not have the processing capability to compress to PNG before transferring? I mean it SOUNDS like it'd save 10-20% off the total transfer...Anyways, what I'd do is I'd simply plop a server rack at the source that takes all the images for a given hour or whatever, tar.gz.bz2.whatevers them & send them over. Otherwise, I mean, Amazon wouldn't be TERRIBLE?

    1. Re:I'll be the one to say it... by FishOuttaWater · · Score: 4, Interesting

      Yes, your first line of defense is to examine what they need as far as these images, and that will tell you how far you can go in reducing their size for transmission and storage. Can they be scaled down? Can they be lossy? Can you take some time to run a more effective lossless algorithm on them? Is there redundancy between images? Secondly, do you have to move the whole image? Can you do your work on a lower quality image to define the series of steps required and then apply those steps remotely at their location? Just think real hard about what the requirements are, and don't rush yourself. You may come up with your best ideas in the shower on this when you have time to think outside the box.

  3. Sounds like... by cultiv8 · · Score: 3, Interesting

    the sales guy oversold your capabilities. Instead of asking about cloud options, why don't you just pick a server host with a good reputation (Amazon and Rackspace come to mind) and pass the costs onto the client?

    --
    sysadmins and parents of newborns get the same amount of sleep.
    1. Re:Sounds like... by Dishwasha · · Score: 2

      Or that the business and product owners under-priced the monthly contract with the client.

      And what the heck does your internal network have anything to do with the performance of your product? Separate your general business network from your server network if not for performance or HIPPA, but for the day when one of your developers or unpatched machines do something to DoS your business.

      Also, you might want to read up on MTU. Large file transfers might be better served with an MTU larger than 1500.

  4. Snail Mail and a hardrive by duckgod · · Score: 4, Informative

    Assuming you don't need real time analysis(doesn't look like it from problem description). Send a couple 500gb hard drives and have someone mail you the daily load of images each day with overnight shipping.

    1. Re:Snail Mail and a hardrive by Resol · · Score: 2

      Yep, in school (long ago) there was an old adage -- "never underestimate the bandwidth of a semi full of mag tapes". Sure the latency is high, but in many cases, not an issue!

    2. Re:Snail Mail and a hardrive by bemymonkey · · Score: 2

      If a bigger pipe is too expensive, overnight shipping of a hard drive every day is going to be WAY too expensive.

  5. Is Amazon S3 an option? by Anonymous Coward · · Score: 2, Informative

    Assuming 5MB of data every 5 seconds, you're dealing with ~90GB of data a day. So, looking at Amazon's pricing model (http://aws.amazon.com/s3/pricing/), assuming you delete the data after you pull it, the storage total should be in the range of $0.095 * 90GB = $8.55/mo. Transfers into S3 are free. You'll be transfering ~2.7TB/mo out (90GB*30), at $0.120/GB, that's $324.00/mo in transfer fees.

    Now, if that data isn't being accumulated 24/7 (ie. if it's only 8/5 for example), that lowers your monthly fees to the $80 range. Sure, you can shop around for someone who will charge you less for transfers (though if they're not charging at all, they may start complaining at the volume you're transfering data around), but $350/mo in fees to help keep a project that's making you money from killing your network? Would sound doable to me.

    1. Re:Is Amazon S3 an option? by Anonymous Coward · · Score: 2, Insightful

      Dropbox is just a VAR for Amazon S3, so it couldn't possibly be cheaper. Most people don't know that half of Silicon Valley is running off Amazon AWS.

    2. Re:Is Amazon S3 an option? by 0100010001010011 · · Score: 2

      Bittorrent Sync is exactly what you're looking for.

      I just setup this same thing to backup all my photos. I was bouncing between rsync, samba and other random different programs. I wanted something to sync between numerous different computers and off site.

      Bittorrent sync solved all of this. It's almost as if they planned for people using it the way I am. In addition to having Mac and Windows clients. They also have

      • Linux ARM
      • Linux PowerPC
      • Linux i386
      • Linux x64
      • Linux PPC QorIQ
      • Linux_i386 (glibc 2.3)
      • Linux_x64 (glibc 2.3)
      • FreeBSD i386
      • FreeBSD X64

      You can either set it up from the command line with a JSON config file or through a web interface on headless machines. I have it setup on one of my VPSs with a large disk. All of my family photos are now 'in the cloud'. Backed up off site. I added another VPS just to see what it'd do. It' synced at around 2-3 MB/s between them and a bit from my home connection. (It does use the bittorrent protocol). So now my home photos are on 2 different VPS on two different continents. If I want to give some one access to them I can generate a read only key or a time limited read only key.

      One of the coolest features is that I have a webserver where I have people upload family photos. I HAD an rsync cron job set up to sync the photos to my computer every night. Now the upload folder is a BitTorrent Sync folder. Within seconds of someone uploading photos. They get sync'd to my desktop, my laptop, my server, my VPS on another continent.

      If you want more redundancy add more servers. The more nodes you add the faster new nodes get 'up to date'.

      Any sufficiently advanced technology is indistinguishable from magic.

  6. Redesign by hanyu.chuang · · Score: 2

    That huge bandwidth is a major load requirement of the project. That bandwidth is going to cost you or your client too much money. I think you should simply look into separating the functionality so you can do the analysis on customer site, and you only "get"(pulling from db, webservice, or a rss feed) the analysis results right there on customer's site, and the rest of your application sit where it is now. From the sounds of it the images are first saved somewhere on customer's network, so perhaps it is not much of a stretch to install your analysis app right there?

  7. why bother? by brad-x · · Score: 2

    if you're going to sync nightly anyway, why bother with a cloud service? just sync at night.

    --
    // -- http://www.BRAD-X.com/ -- //
  8. Get another line. by bill_mcgonigle · · Score: 2

    You're probably going to pay less for a second cable modem line than you will to store that much data in the cloud. Cloud processing is fairly cheap - cloud storage is expensive.

    And then you won't have to re-tool anything else in your processes, except maybe adding another route or two. If you're doing that much data processing, the $200/mo for the line shouldn't really be a huge expense on the contract.

    If you're looking to scale out this service to lots of companies, then the calculus might be different.

    --
    My God, it's Full of Source!
    OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  9. Don't bother by Billly+Gates · · Score: 2

    It is more expensive than a cloud unless you are really big. Many startups that used to use Amazon's service decided with virtualization it was cheaper to use their own after they needed fiber connections and others to host massive bandwidth for all the boxens on the cloud.

    With 1/2 down your speed will be adversely affected. With VSphere is about $7,000 including a CentOS or Windows Server License and Windows Server 2012 with HyperV is the same price. You can host VMs and have data backed up elsewhere for redundancy. Yes this will eat up data and raise costs with your T3, but it will consume less data than clouding everything.

    Repeat the cloud does not save you money with all the hidden costs.

  10. 5mbytes every 3seconds is only 13.333 mbits/s. by millisa · · Score: 4, Informative

    we're on a business class 100mb cable connection
    100mbps = 12mbyte/s (give up 15-20% for the packet overhead, 10megabytes/sec).

    Distilling that summary into the data that mattered:
    1.5mb image, 3mb file each under 5 megs.
    and
    images every 3-5 seconds

    The files are 5megabytes total.
    In a perfect world, they'd transfer in 0.5 seconds.

    Leaving 2.5 - 4.5 seconds for the porn.

    Let's assume they are the bigger size, 5megabytes, and they transfer in the more frequent number, every 3 seconds.
    5MBytes/3s = 1.66667 Mbytes/s = 13.33333 mbits/s.

    Why is a facility with a 100mb/s line incapable of handling this?
    How did a problem where a 100mb/s line can't handle 13.3333mb/s come to a conclusion of "Fix it with the cloud?"

    In any case, if you want to do a cloud setup, just about all of them will handle small 13.3mb/s constant rates and you'll pay for it more than if you figured out why your line isn't keeping up.

  11. Solving the wrong problem? by n7ytd · · Score: 2

    I don't understand what the issue is here. What the OP seems to be really asking is how to move the bandwidth requirement to overnight, when no one is using their connection for other business purposes.

    If time-shifting the syncing to off-hours is acceptable, why do you not install a server with a beefy hard drive at the client location to do just that?

    Have you explored the idea of compressing the data at the client side before sending it your way? Bitmaps often compress very well, especially if you can batch very similar ones together. A script to make a gzipped tar file every 5 minutes might do wonders for your data requirements.

    If you're ready to shell out the money for a cloud provider, why not instead shell out the money for a second connection to dedicate to this client?

    What does moving the data through a third party in "the cloud" offer over any of (or a combination of) these three approaches?