Ask Slashdot: Cloud Service On a Budget?
First time accepted submitter MadC0der writes "We just signed a project with a very large company. We are a computer vision based company and our project gathers images from a facility from PA. Our company is located in TN. The company we're gather images from is on a very high speed fiber optic network. However, being a small company of 11 developers, and 1 systems engineer, we're on a business class 100mb cable connection which works well for us but not in this situation. The information gathered from the client in PA is s 1½mb .bmp image, along with a 3mb Depth map file, making each snapshot a little under 5 megs. This may sound small, but images are taken every 3-5 seconds. This can lead to a very large amount of data captured and transferred each day. Our facility is incapable of handling such large transfers without effecting internal network performance. We've come to the conclusion that a cloud service would be the best solution for our problem. We're now thinking the customer's workstation will sync the data with the cloud, and we can automate pulling the data during off hours so we won't encounter congestion for analysis. Can anyone help suggest a stable, fairly price cloud solution that will sync large amounts for offsite data for retrieval at our convenience (nightly Rsync script should handle this process)?
Bring your own server. Depending on the time frame/duration of the project, it might be more cost effective to rent a quarter or half rack in a datacenter and build/buy your own servers. High initial up front cost, but does save money in the long run.
...WHY are you using BMP in the first place? Does whatever you're generating these on not have the processing capability to compress to PNG before transferring? I mean it SOUNDS like it'd save 10-20% off the total transfer...Anyways, what I'd do is I'd simply plop a server rack at the source that takes all the images for a given hour or whatever, tar.gz.bz2.whatevers them & send them over. Otherwise, I mean, Amazon wouldn't be TERRIBLE?
the sales guy oversold your capabilities. Instead of asking about cloud options, why don't you just pick a server host with a good reputation (Amazon and Rackspace come to mind) and pass the costs onto the client?
sysadmins and parents of newborns get the same amount of sleep.
Assuming you don't need real time analysis(doesn't look like it from problem description). Send a couple 500gb hard drives and have someone mail you the daily load of images each day with overnight shipping.
Assuming 5MB of data every 5 seconds, you're dealing with ~90GB of data a day. So, looking at Amazon's pricing model (http://aws.amazon.com/s3/pricing/), assuming you delete the data after you pull it, the storage total should be in the range of $0.095 * 90GB = $8.55/mo. Transfers into S3 are free. You'll be transfering ~2.7TB/mo out (90GB*30), at $0.120/GB, that's $324.00/mo in transfer fees.
Now, if that data isn't being accumulated 24/7 (ie. if it's only 8/5 for example), that lowers your monthly fees to the $80 range. Sure, you can shop around for someone who will charge you less for transfers (though if they're not charging at all, they may start complaining at the volume you're transfering data around), but $350/mo in fees to help keep a project that's making you money from killing your network? Would sound doable to me.
"Who" is paying for the stream of pics of such quality and via a "very high speed fiber optic network"
eg. If you are counting wildlife, ask the gov/state for more hardware.
Cash might be very tight but gov data storage options should be usable.
Is it OCR on cars? Changes in activity around buildings?
If the "facility" has the need and cash to pay for images to be taken, optical and your work - ask for more cheap, fast storage.
As for the "cloud" and the nature of your work be aware that the US and a few other govs can have a look anytime.
http://www.smh.com.au/technology/technology-news/whistleblower-reveals-australias-spy-agency-has-access-to-internet-codes-20130906-2tand.html Best to air gap the 'results' part of your work from the bulk input and keep it all internal.
Domestic spying is now "Benign Information Gathering"
There's always Egnyte (https://www.egnyte.com/)
They're not very expensive and they offer what they call an "ELC" (enterprise local cloud) or "OLC" (office local cloud). The way it works is you store the files in their datacenter and you can use their elc/olc clients effectively as a caching mechanism that is sync'd with cloud contents. This happens in such a way that anyone in your office/datacenter can access files from a common interface/api without having to saturate your 100meg pipe by fetching the same file multiple times.
That huge bandwidth is a major load requirement of the project. That bandwidth is going to cost you or your client too much money. I think you should simply look into separating the functionality so you can do the analysis on customer site, and you only "get"(pulling from db, webservice, or a rss feed) the analysis results right there on customer's site, and the rest of your application sit where it is now. From the sounds of it the images are first saved somewhere on customer's network, so perhaps it is not much of a stretch to install your analysis app right there?
if you're going to sync nightly anyway, why bother with a cloud service? just sync at night.
// -- http://www.BRAD-X.com/ --
You mention rsync etc.... is this really necessary? From your description it sounds like the biggest potential cost for you is going to be network(followed by storage), but depending on where you are using the data the charges can vary wildly. For instance, incoming traffic from the internet TO Amazon is actually free, but outgoing is not free. If you really want to save money you are probably better off actually doing your processing in the cloud as well. Otherwise those bandwidth charges are going to eat you alive.
Monstar L
The NSA will steal your photos. Unless your 'vision based company' is doing some shifty security work for the NSA. In that case, you're fine and have a ridiculous budget so this post doesn't apply to you.
Since you are just spooling, that should be more than adequate.
(Not sure how someone else calculated 432GB/day, and I am horrified by the suggestion to overnight mail hard drives - way too expensive.)
You're probably going to pay less for a second cable modem line than you will to store that much data in the cloud. Cloud processing is fairly cheap - cloud storage is expensive.
And then you won't have to re-tool anything else in your processes, except maybe adding another route or two. If you're doing that much data processing, the $200/mo for the line shouldn't really be a huge expense on the contract.
If you're looking to scale out this service to lots of companies, then the calculus might be different.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Amazon has a low-cost version of S3 called Glacier, the downside of which is slow data retrieval time.
Also, on the extremely unlikely chance you're using Apple, there's a solid tool called Arc which will front-end for Glacier, and add encryption and automation to boot.
It is more expensive than a cloud unless you are really big. Many startups that used to use Amazon's service decided with virtualization it was cheaper to use their own after they needed fiber connections and others to host massive bandwidth for all the boxens on the cloud.
With 1/2 down your speed will be adversely affected. With VSphere is about $7,000 including a CentOS or Windows Server License and Windows Server 2012 with HyperV is the same price. You can host VMs and have data backed up elsewhere for redundancy. Yes this will eat up data and raise costs with your T3, but it will consume less data than clouding everything.
Repeat the cloud does not save you money with all the hidden costs.
http://saveie6.com/
we're on a business class 100mb cable connection
100mbps = 12mbyte/s (give up 15-20% for the packet overhead, 10megabytes/sec).
Distilling that summary into the data that mattered:
1.5mb image, 3mb file each under 5 megs.
and
images every 3-5 seconds
The files are 5megabytes total.
In a perfect world, they'd transfer in 0.5 seconds.
Leaving 2.5 - 4.5 seconds for the porn.
Let's assume they are the bigger size, 5megabytes, and they transfer in the more frequent number, every 3 seconds.
5MBytes/3s = 1.66667 Mbytes/s = 13.33333 mbits/s.
Why is a facility with a 100mb/s line incapable of handling this?
How did a problem where a 100mb/s line can't handle 13.3333mb/s come to a conclusion of "Fix it with the cloud?"
In any case, if you want to do a cloud setup, just about all of them will handle small 13.3mb/s constant rates and you'll pay for it more than if you figured out why your line isn't keeping up.
Storing 140 gigabytes a day is going to be expensive with any cloud service; you will essentially be using 4 Terabytes per month in bandwidth; as well as a lot of disk storage --- cloud providers charge dearly for this.
You might be better off getting your local network's connection upgraded. Obviously; this has benefits beyond merely offloading storage.
We're now thinking the customer's workstation will sync the data with the cloud, and we can automate pulling the data during off hours so we won't encounter congestion for analysis.
If you are pulling data off-hours anyways; perhaps the best thing to do would be to have a local server as close as possible to the point that data is being gathered, receive the data, and handle sending the data to you --- E.g. "collect the data on-site" right by the place it's being gathered.
This should help with the end-to-end congestion thing.
I'd install a dedicated link and just add the cost of the link to the project's expense list.
Sooner or later you're downloading the data, and most customers I've dealt with would have an issue with spooling their data to the cloud in the first place -- it's why they would have contracted a small firm to do the processing in the first place.
Let's face it -- network capacity is just not that expensive nowadays, especially seeing as you sound like you're primarily interested in download speed, which means you can opt for asynchronous solutions that have greater download capacity than upload (which are usually cheaper.)
And for God's sake -- compress your data!!!
I do not fail; I succeed at finding out what does not work.
If 11 developers and 1 sysadmin can't figure this one out, you should be all fired.
Hire a network consultant to fix your broken internet. After that's done have them figure out how you guys can scale. It's probably not a great idea to have to send all this stuff to you office. I am assuming your using GPU's those can be rented and/or bought. You probably want a system that can be distributed fairly well.
The cloud is a buzzword not a product. A coloed 1ru server can hold about 40TB of bulk storage, Most colos will lets you use nearly unlimited inbound traffic (normal ratio is 1 to 10 inbound to outbound and they pay for the higher of the two) so it's effectively a free resource. Past that whatever you need to process that data can be shifted into colo. Two or more sites in the long term with cross failover and load balancing is probably you best long term position.
No sir I dont like it.
There are several companies out there who do nothing but handle image processing "in the cloud". They could be used as simple bulk file transfers, or they might help solve the real problem â" dealing with large, uncompressed images.
I know of two off the top of my head:
In either case, your clients can upload the files directly to their servers, and the 3rd party company can begin converting immediately (if you choose).
All of these questions like this usually are.
So 12mb/s (max) of transfers will bog down your 100mb/s connection so badly that you just cannot do it??? Uhm, are you sure about that???
Well, OK then. Get another one.
The company we're gather images from ... ...without effecting internal network performance.
I mean really... If you can't manage to write a coherent, error-free paragraph written in fairly simple SVO sentences or can't be bothered to proofread an article submission before posting, what makes you think that you could effectively manage a cloud-based infrastructure (or any other kind, for that matter)?
Hell, with your skills just burn the files onto DVD's and toss them in the rubbish bin. It'll work just as well...
That is all.
If all you want is simple folder synchronization, (computer in TN writes a file to a folder, computer in PA downloads it 10-20 seconds later,) than you might want to look at EMC Syncplicity. (I'm the desktop lead.)
No, I will not work for your startup
I don't understand what the issue is here. What the OP seems to be really asking is how to move the bandwidth requirement to overnight, when no one is using their connection for other business purposes.
If time-shifting the syncing to off-hours is acceptable, why do you not install a server with a beefy hard drive at the client location to do just that?
Have you explored the idea of compressing the data at the client side before sending it your way? Bitmaps often compress very well, especially if you can batch very similar ones together. A script to make a gzipped tar file every 5 minutes might do wonders for your data requirements.
If you're ready to shell out the money for a cloud provider, why not instead shell out the money for a second connection to dedicate to this client?
What does moving the data through a third party in "the cloud" offer over any of (or a combination of) these three approaches?
You might be able to utilize something like AWS S3 storage which is low cost for the storage but AWS will also charge you for I/O to/from S3. This can become very costly if you transfer alot of data into/out-of AWS S3.
Remember with a Cloud provider you have to pay to transfer the data IN and to transfer the data OUT.
Have you priced what a faster internet connection would cost you?
Or a 2nd Internet connection just for this video traffic?
Look beyond the Cable MSO's also, what is a FIOS based service's top speed?
You mention you BMP images being ~5MBytes (I assume mb = Meg Bytes and not Meg bits). Your current Internet is 100Mbps so one of your images takes 40% of your entire internet connection when being transfered (5MB x 8 bits = 40Mbits).
It takes an image every 3 to 5 seconds.
It seems to me that your problem may be more the bursti-ness of this traffic that cause you problems not necessarily the amount of data. Your internel "work" network is being hit every 3-5 seconds. Assuming your internal lans are 1Gbps ethernet this still shouldn't be a problem unless its your co-workers complaining that their "internet" access is too slow when 40% of the BW goes away every 3-5 seconds while transferring the image.
Lastly, you might want to make sure that your network Routers are not dropping pkts during those bursts because that will just be retransmitted packets which will only exacerbate your problem.