Ask Slashdot: Cloud Service On a Budget?
First time accepted submitter MadC0der writes "We just signed a project with a very large company. We are a computer vision based company and our project gathers images from a facility from PA. Our company is located in TN. The company we're gather images from is on a very high speed fiber optic network. However, being a small company of 11 developers, and 1 systems engineer, we're on a business class 100mb cable connection which works well for us but not in this situation. The information gathered from the client in PA is s 1½mb .bmp image, along with a 3mb Depth map file, making each snapshot a little under 5 megs. This may sound small, but images are taken every 3-5 seconds. This can lead to a very large amount of data captured and transferred each day. Our facility is incapable of handling such large transfers without effecting internal network performance. We've come to the conclusion that a cloud service would be the best solution for our problem. We're now thinking the customer's workstation will sync the data with the cloud, and we can automate pulling the data during off hours so we won't encounter congestion for analysis. Can anyone help suggest a stable, fairly price cloud solution that will sync large amounts for offsite data for retrieval at our convenience (nightly Rsync script should handle this process)?
Bring your own server. Depending on the time frame/duration of the project, it might be more cost effective to rent a quarter or half rack in a datacenter and build/buy your own servers. High initial up front cost, but does save money in the long run.
Editor to the submission. Any available editor with a decent grasp of English vocabulary and grammar, please respond immediately.
...WHY are you using BMP in the first place? Does whatever you're generating these on not have the processing capability to compress to PNG before transferring? I mean it SOUNDS like it'd save 10-20% off the total transfer...Anyways, what I'd do is I'd simply plop a server rack at the source that takes all the images for a given hour or whatever, tar.gz.bz2.whatevers them & send them over. Otherwise, I mean, Amazon wouldn't be TERRIBLE?
the sales guy oversold your capabilities. Instead of asking about cloud options, why don't you just pick a server host with a good reputation (Amazon and Rackspace come to mind) and pass the costs onto the client?
sysadmins and parents of newborns get the same amount of sleep.
Just the faqs.
Assuming you don't need real time analysis(doesn't look like it from problem description). Send a couple 500gb hard drives and have someone mail you the daily load of images each day with overnight shipping.
why not just get another connection that solely does the data transfers? also get images compressed(zipped) first may reduce bandwith needs?
My solution will be to use Mac Minis for storage and processing (accessed through ssh/screen - individual user account per server process) and Raspberry Pis (with a distro I'll be calling Dr P Linux) for handling connections. More connections = more pis, more processing/services = more insecure gruntboxes.
Assuming 5MB of data every 5 seconds, you're dealing with ~90GB of data a day. So, looking at Amazon's pricing model (http://aws.amazon.com/s3/pricing/), assuming you delete the data after you pull it, the storage total should be in the range of $0.095 * 90GB = $8.55/mo. Transfers into S3 are free. You'll be transfering ~2.7TB/mo out (90GB*30), at $0.120/GB, that's $324.00/mo in transfer fees.
Now, if that data isn't being accumulated 24/7 (ie. if it's only 8/5 for example), that lowers your monthly fees to the $80 range. Sure, you can shop around for someone who will charge you less for transfers (though if they're not charging at all, they may start complaining at the volume you're transfering data around), but $350/mo in fees to help keep a project that's making you money from killing your network? Would sound doable to me.
Presumably you have looked at vendors already, otherwise how would you know capabilities exist in "the cloud"? Keep those cloud conclusions coming, until you find the right vendor.
"Who" is paying for the stream of pics of such quality and via a "very high speed fiber optic network"
eg. If you are counting wildlife, ask the gov/state for more hardware.
Cash might be very tight but gov data storage options should be usable.
Is it OCR on cars? Changes in activity around buildings?
If the "facility" has the need and cash to pay for images to be taken, optical and your work - ask for more cheap, fast storage.
As for the "cloud" and the nature of your work be aware that the US and a few other govs can have a look anytime.
http://www.smh.com.au/technology/technology-news/whistleblower-reveals-australias-spy-agency-has-access-to-internet-codes-20130906-2tand.html Best to air gap the 'results' part of your work from the bulk input and keep it all internal.
Domestic spying is now "Benign Information Gathering"
There's always Egnyte (https://www.egnyte.com/)
They're not very expensive and they offer what they call an "ELC" (enterprise local cloud) or "OLC" (office local cloud). The way it works is you store the files in their datacenter and you can use their elc/olc clients effectively as a caching mechanism that is sync'd with cloud contents. This happens in such a way that anyone in your office/datacenter can access files from a common interface/api without having to saturate your 100meg pipe by fetching the same file multiple times.
Lateral problem-solving here. Assuming you can't vary the BMP requirement, have you considered some sort of WAN-compression? Riverbed is the market leader but there are a host of other alternatives out there now. These sort of boxes use all sorts of magic tricks but the main ones are on-the-fly compression and caching of data. The caching may still be of use if there is repetition in the images - it works at the bitstream level rather than recognising entire files meaning if there is a similar block of data inside the BMP it can still benefit from the caching. The benefit will greatly vary based upon your source data, but I think you can get a unit on eval.
Riverbed in particular is not cheap (seems to be priced just below the cost of a WAN upgrade) but it should improve the situation and can also limit the throughput so you can still utilise the link for other things. The only downside is you need an appliance (physical or virtual) at both sides so you may need to get the other side to play ball.
Very easy to prototype and then you can knock up the ante to see if it scales. With over 4 languages supported, its rather flexible.
That huge bandwidth is a major load requirement of the project. That bandwidth is going to cost you or your client too much money. I think you should simply look into separating the functionality so you can do the analysis on customer site, and you only "get"(pulling from db, webservice, or a rss feed) the analysis results right there on customer's site, and the rest of your application sit where it is now. From the sounds of it the images are first saved somewhere on customer's network, so perhaps it is not much of a stretch to install your analysis app right there?
I don't see why a cloud solution is so ideal for your system.
Note that everything from traffic to used diskspace gets charged.
If you have a lot of traffic, you maybe better look at an unmetered dedicated server, especially if you can calculate your needs in advance.
Some hosting companies offer reasonable prices for such packages compared to all additional costs that Amazon will charge you.
Another option is using spot instances from Amazon (I don't know if other providers have them).
It is basically an auction based usage scheme but you have to model your processing for it.
Depending on the type of machine you usually pay 1/3 ~ 1/4 of the normal fees.
If someone offers a better price your virtual server will be assigned to them so you have to take into account that your server can be terminated at any point.
Your external storage will be just disconnected so when you boot up another spot instance your data will still available.
It's a bit tricky so set up properly but it is a very cheap solution for processing a lot of data at low cost.
Of course with Amazon you can combine normal permanent instances (e.g data collector) and permanent database services with multiple spot instances (e.g analysis workers).
The inbound and outbound traffic costs for a spot instance is however the same as a permanent EC2 instance.
if you're going to sync nightly anyway, why bother with a cloud service? just sync at night.
// -- http://www.BRAD-X.com/ --
You mention rsync etc.... is this really necessary? From your description it sounds like the biggest potential cost for you is going to be network(followed by storage), but depending on where you are using the data the charges can vary wildly. For instance, incoming traffic from the internet TO Amazon is actually free, but outgoing is not free. If you really want to save money you are probably better off actually doing your processing in the cloud as well. Otherwise those bandwidth charges are going to eat you alive.
Monstar L
Assuming your don't care about the images themselves, just the result of processing the images, then you can save bandwidth by putting your own server at your customer's site.
Instead of transmitting the images to you to analyse, they would copy the files to your server via the LAN. You would connect to the server from your offices and execute your commands remotely. Again, assuming the results of the analysis are small you can send them back to yourselves, and send a copy locally to the customer.
If you *do* need the actual images in your office then my assumption doesn't hold, so use lossless compression, or find out what level of lossy compression is acceptable. And never underestimate the bandwidth of a truck full of hard drives.
Amazon Web Services does not charge for data transfer into an AWS instance currently: http://calculator.s3.amazonaws.com/calc5.html
We are using AWS server instances to download and process multiple large files that are in the 10 to 150 GB range each.
If you shut down your analysis instances when they are not being used, you will only get charged for the storage that they consume, not the compute time.
You can further reduce your costs by using a reserved instance to be available 24/7 to receive image files and launch spot instances to run analysis.
Once you are finished with the images, dump them into AWS S3 and then into AWS Glacier for longer term storage.
Screw the summary. Is this guy really asking us how to do his job?
No, that link you posted to a web comic we've all seen a hundred times is not "obligatory."
The NSA will steal your photos. Unless your 'vision based company' is doing some shifty security work for the NSA. In that case, you're fine and have a ridiculous budget so this post doesn't apply to you.
Buy network equipment that can set sufficient QoS to cap the transfers to ~80% so that your other traffic won't get bugged. I'm guessing that is what "internal network can't handle it" means.
Buy a network attached storage device. If you are worried about it, buy two. Keep it on a LAN. Don't attach it to a wan. Store your data redundantly on it. Automated backups are best. Have someone maintain the archives. Keep it off the 'net, keep it away from the NSA. Remember: NAS good, NSA bad.
Since you are just spooling, that should be more than adequate.
(Not sure how someone else calculated 432GB/day, and I am horrified by the suggestion to overnight mail hard drives - way too expensive.)
We house about 15TB of video data and growing about 5TB a year at my shop - we also work in CV. I'd say in house server is your best thing for local processing... the cloud's storage costs may eat you alive in time. Get a second internet connection if you don't want to be affected. I suggest snail mailing drives of the large collections. 3TB drives are about 100 bucks right now and sata copies about 120MB/s so the copy jobs don't take too long assuming your server is decent. Towards compression... I imagine anything other than lossless is out.... I can only really think of compression on the BMP easily and for that I'd use tiled adobe deflated (i.e. zip) TIFFs - you'll have to find the sweet spot for tile size. It will cut down the size to about 2/3rds. There's also JPG2000 lossless mode but that will be SLOW to work with and less software supports it than tiled tiffs, but it will have much better compression ratios on the average. You're on your own for depth maps unless they are raster in which case you can probably do the same thing with them. Since most sensors are some odd number of bits and there's spacial locality (tiles exploit this) your BW and storage needs will tend for significant savings but likely no more than half of their original size.
You're probably going to pay less for a second cable modem line than you will to store that much data in the cloud. Cloud processing is fairly cheap - cloud storage is expensive.
And then you won't have to re-tool anything else in your processes, except maybe adding another route or two. If you're doing that much data processing, the $200/mo for the line shouldn't really be a huge expense on the contract.
If you're looking to scale out this service to lots of companies, then the calculus might be different.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Amazon has a low-cost version of S3 called Glacier, the downside of which is slow data retrieval time.
Also, on the extremely unlikely chance you're using Apple, there's a solid tool called Arc which will front-end for Glacier, and add encryption and automation to boot.
100Mbit is a standard residential line, why not just pay the extra $50/mo and get another line for getting the image feed.
You say you are getting around 5 Mbytes every 3 seconds, that is 15 Mbit/s. This capacity downstream is easy to get.
You can also do some network wizardry and limit you receiving PC to 20 Mbit/s such as not to kill the 100Mbit/s network. Rate limiting is not a big thing. Then there will be no bursts killing the connection.
Blocking Youtube / Netflix etc might help too.
You're getting paid, right? Do your own fucking job.
your fucking solution already... given that you proposed something and then "signed a project with a very large company"
slashdot is not letothersdoyourwork.com
Why they can't compress the .bmp files using 7zip or similar high compression s/w before uploading.
Cloud is only going to slow down and complicate this.
It is more expensive than a cloud unless you are really big. Many startups that used to use Amazon's service decided with virtualization it was cheaper to use their own after they needed fiber connections and others to host massive bandwidth for all the boxens on the cloud.
With 1/2 down your speed will be adversely affected. With VSphere is about $7,000 including a CentOS or Windows Server License and Windows Server 2012 with HyperV is the same price. You can host VMs and have data backed up elsewhere for redundancy. Yes this will eat up data and raise costs with your T3, but it will consume less data than clouding everything.
Repeat the cloud does not save you money with all the hidden costs.
http://saveie6.com/
we're on a business class 100mb cable connection
100mbps = 12mbyte/s (give up 15-20% for the packet overhead, 10megabytes/sec).
Distilling that summary into the data that mattered:
1.5mb image, 3mb file each under 5 megs.
and
images every 3-5 seconds
The files are 5megabytes total.
In a perfect world, they'd transfer in 0.5 seconds.
Leaving 2.5 - 4.5 seconds for the porn.
Let's assume they are the bigger size, 5megabytes, and they transfer in the more frequent number, every 3 seconds.
5MBytes/3s = 1.66667 Mbytes/s = 13.33333 mbits/s.
Why is a facility with a 100mb/s line incapable of handling this?
How did a problem where a 100mb/s line can't handle 13.3333mb/s come to a conclusion of "Fix it with the cloud?"
In any case, if you want to do a cloud setup, just about all of them will handle small 13.3mb/s constant rates and you'll pay for it more than if you figured out why your line isn't keeping up.
If you can't deliver the service, you should not have signed the contract. Unless you can think up some hack quickly, the only correct solution here is to go out of business as a direct result of your gross incompetence.
Storing 140 gigabytes a day is going to be expensive with any cloud service; you will essentially be using 4 Terabytes per month in bandwidth; as well as a lot of disk storage --- cloud providers charge dearly for this.
You might be better off getting your local network's connection upgraded. Obviously; this has benefits beyond merely offloading storage.
We're now thinking the customer's workstation will sync the data with the cloud, and we can automate pulling the data during off hours so we won't encounter congestion for analysis.
If you are pulling data off-hours anyways; perhaps the best thing to do would be to have a local server as close as possible to the point that data is being gathered, receive the data, and handle sending the data to you --- E.g. "collect the data on-site" right by the place it's being gathered.
This should help with the end-to-end congestion thing.
I'd install a dedicated link and just add the cost of the link to the project's expense list.
Sooner or later you're downloading the data, and most customers I've dealt with would have an issue with spooling their data to the cloud in the first place -- it's why they would have contracted a small firm to do the processing in the first place.
Let's face it -- network capacity is just not that expensive nowadays, especially seeing as you sound like you're primarily interested in download speed, which means you can opt for asynchronous solutions that have greater download capacity than upload (which are usually cheaper.)
And for God's sake -- compress your data!!!
I do not fail; I succeed at finding out what does not work.
Just curious - why do you think you need anything with "cloud" in the title?
A simple rsync cron job will take care of everything. Build a local server cluster and co-locate in a local datacenter. Far, far, far less expensive than something with "cloud" in the title and you'll have total control over everything. Expect this solution to pay for itself in about 6 - 10 months versus going cloud.
You mention that your 100 megabit connection isn't fast enough, so you're thinking of going to the cloud. Going to the cloud won't improve your local Internet connection.
If 11 developers and 1 sysadmin can't figure this one out, you should be all fired.
Hire a network consultant to fix your broken internet. After that's done have them figure out how you guys can scale. It's probably not a great idea to have to send all this stuff to you office. I am assuming your using GPU's those can be rented and/or bought. You probably want a system that can be distributed fairly well.
The cloud is a buzzword not a product. A coloed 1ru server can hold about 40TB of bulk storage, Most colos will lets you use nearly unlimited inbound traffic (normal ratio is 1 to 10 inbound to outbound and they pay for the higher of the two) so it's effectively a free resource. Past that whatever you need to process that data can be shifted into colo. Two or more sites in the long term with cross failover and load balancing is probably you best long term position.
No sir I dont like it.
Your company is small seems to be growing. Move it to Chattanooga TN and grab that Gigabit Fiber then you can just do transfers as the photos are taken/processed. It's not so radical. No different than startups moving to California in search of money. It's out of the box but the best solution. Plus once there you can sell that service all over the country because you are on a gig fiber network.
I concur, the transfer rate is in the range of 14Mbps and the 100Mbps circuit should be able to very easily handle it. This person either doesn't have a clue what they are doing or talking about, or bandwidth is a red herring and not the real issue.
144GB of storage per day (5MB/3sec.) might be the issue. At that rate, he'd need a new ~3TB drive every month. This also doesn't seem like a major problem.
It's really looking like this guy doesn't have a clue!
There are several companies out there who do nothing but handle image processing "in the cloud". They could be used as simple bulk file transfers, or they might help solve the real problem â" dealing with large, uncompressed images.
I know of two off the top of my head:
In either case, your clients can upload the files directly to their servers, and the 3rd party company can begin converting immediately (if you choose).
assuming fastest snapshot every three seconds....
86400 s in a day / 3 sec snapshot * 5 MB per image = ~ 144 GB per day
A batch download at night will take 3 hours, manageable assuming you must stay with .bmp
Switch to jpg and play with the compression to suit your needs and you can easily slash that size by 10 or more.
We have a problem with our %dailyActivity%, can the %popularBuzzword% fix it?
So 12mb/s (max) of transfers will bog down your 100mb/s connection so badly that you just cannot do it??? Uhm, are you sure about that???
Well, OK then. Get another one.
9.5 cents / GB Month, plus a couple of cents per 1000 requests to the system. Seem pretty reasonable to me.
http://aws.amazon.com/s3/pricing/
http://www.sitepoint.com/5-useful-amazon-s3-backup-tools/
I am in looking for the server cloud. Please to be kindly corresponding with solutions for my project. I am most anxiously and Ranjali is most missing me from the home time.
Please do the needful.
I would recommend setting up a Blob and Queue on a cloud provider like Azure. If you stored all the images on a Blob, while putting information in a queue that they are there to be processed, you could enable a buffer between your client and your office.
At some point you will have to ensure that you are keeping up with the amount of information that they deliver.
If you are interested in talking about this shoot me an email: jef@agilebusinesscloud.com
Why not opt for a secondary internet connection that is only used for client data transfer? Would be way cheaper in the long run, especially if you have your own computing resources to use for the work once you have the data.
Another posted commented on this, but I would get another broadband connection, be it Cable, Fiber or whatever is available that is seperate from your current provider. Route the data through that, but also use that connection as a fallback in case your primary link were to fail. I would QOS the traffic, if the primary link failed though, so you could continue meeting the customers expectations of processing their images.
You could then throw another nic into the server and have that on your local LAN for immediate image analysis.
The company we're gather images from ... ...without effecting internal network performance.
I mean really... If you can't manage to write a coherent, error-free paragraph written in fairly simple SVO sentences or can't be bothered to proofread an article submission before posting, what makes you think that you could effectively manage a cloud-based infrastructure (or any other kind, for that matter)?
Hell, with your skills just burn the files onto DVD's and toss them in the rubbish bin. It'll work just as well...
That is all.
http://www.cbts.cinbell.com/Solutions/cloud-computing
If all you want is simple folder synchronization, (computer in TN writes a file to a folder, computer in PA downloads it 10-20 seconds later,) than you might want to look at EMC Syncplicity. (I'm the desktop lead.)
No, I will not work for your startup
Find a local data center and setup a file server with RAID. Some data centers already provide file hosting so building a server may not be necessary.
Full disclosure: I'm an employee at Mover.io. We are already handling very similar problems for other companies, and have infrastructure designed to solve this exact type of problem.
Just sync to amazon glacier. But seriously, is this the best solution? Perhaps you would be better off:
1. Negotiating for some space for your staff to work at their site with your equipment.
2. Putting a server at their site running your processing software on it and connecting to it remotely from your office.
3. Upgrading your link
The real question is, why did you propose to do something you couldn't actually do and why did you do this before knowing what you could do. Seems shortsighted or dishonest to me.
I don't understand what the issue is here. What the OP seems to be really asking is how to move the bandwidth requirement to overnight, when no one is using their connection for other business purposes.
If time-shifting the syncing to off-hours is acceptable, why do you not install a server with a beefy hard drive at the client location to do just that?
Have you explored the idea of compressing the data at the client side before sending it your way? Bitmaps often compress very well, especially if you can batch very similar ones together. A script to make a gzipped tar file every 5 minutes might do wonders for your data requirements.
If you're ready to shell out the money for a cloud provider, why not instead shell out the money for a second connection to dedicate to this client?
What does moving the data through a third party in "the cloud" offer over any of (or a combination of) these three approaches?
Split your network into two and add some routers.
Create a high-speed LAN (and storage) for high speed fiber optic network,
and a slower one for regular use.
Add router(s) as needed.
You might be able to utilize something like AWS S3 storage which is low cost for the storage but AWS will also charge you for I/O to/from S3. This can become very costly if you transfer alot of data into/out-of AWS S3.
Remember with a Cloud provider you have to pay to transfer the data IN and to transfer the data OUT.
Have you priced what a faster internet connection would cost you?
Or a 2nd Internet connection just for this video traffic?
Look beyond the Cable MSO's also, what is a FIOS based service's top speed?
You mention you BMP images being ~5MBytes (I assume mb = Meg Bytes and not Meg bits). Your current Internet is 100Mbps so one of your images takes 40% of your entire internet connection when being transfered (5MB x 8 bits = 40Mbits).
It takes an image every 3 to 5 seconds.
It seems to me that your problem may be more the bursti-ness of this traffic that cause you problems not necessarily the amount of data. Your internel "work" network is being hit every 3-5 seconds. Assuming your internal lans are 1Gbps ethernet this still shouldn't be a problem unless its your co-workers complaining that their "internet" access is too slow when 40% of the BW goes away every 3-5 seconds while transferring the image.
Lastly, you might want to make sure that your network Routers are not dropping pkts during those bursts because that will just be retransmitted packets which will only exacerbate your problem.