Ask Slashdot: Scientific Computing Workflow For the Cloud?

← Back to Stories (view on slashdot.org)

Ask Slashdot: Scientific Computing Workflow For the Cloud?

Posted by Soulskill on Friday November 29, 2013 @07:08AM from the science-as-a-service dept.

diab0lic writes "I have recently come into the situation where I need to run cloud computing on demand for my research. Amazon's EC2 Spot Instances are an ideal platform for this as I can requisition an appropriate instance for the given experiment {high cpu, high memory, GPU instance} depending on its needs. However I currently spin up the instance manually, set it up, run the experiment, and then terminate manually. This gets tedious monitoring experiments for completion, and I incur unnecessary costs if a job finishes while I'm sleeping, for example. The whole thing really should be automated. I'm looking for a workflow somewhat similar to this:

Manually create Amazon machine image (AMI) for experiment.
Issue command to start AMI on specified spot instance type.
Automatically connect EBS to instance for result storage.
Automatically run specified experiment, bonus if this can be parameterized.
Have AMI automatically terminate itself upon experiment completion.

Something like docker that spun up on-demand spot instances of a specified type for each run and terminated said instance at run completion would be absolutely perfect. I also know HTCondor can back onto EC2 spot instances but I haven't really been able to find any concise information on how to set up a personal cloud — I also think this is slight overkill. Do any other Slashdot users have similar problems? How did you solve it? What is your workflow? Thanks!"

80 comments

Min score:

Reason:

Sort:

Config Management by Anonymous Coward · 2013-11-29 07:12 · Score: 0

Use Puppet, Chef or the like? Should allow you to provision instances on demand, if you predefine some of the requirements...
EC2 is scriptable by Anonymous Coward · 2013-11-29 07:16 · Score: 5, Informative

EC2 is inherently scriptable. There's nothing stopping you from using the command-line tools to fire up an instance, and let it run, and store its results to S3, and then decommission the instance. You can even set the instances to terminate on shutdown, which deletes the instance EBS stores (if you're using EBS) and deletes the instance. Sounds like you just need to spend 30 minutes reading the docs.
1. Re:EC2 is scriptable by Charliemopps · 2013-11-29 08:23 · Score: 2
  
  I was about to say, if you can't figure this out, no wonder you need a super computer to run your code. :-D
2. Re:EC2 is scriptable by diab0lic · 2013-11-29 09:22 · Score: 5, Insightful
  
  I'm aware that EC2 is inherently scriptable, though the documentation is incredibly poor for some areas, and heavily favours those interested in long running instances. This post is about asking others what their workflow for short term spot instances is, and generating some collaboration and sharing of ideas on the subject. Looking through the other comments there is a PhD who wrote some of his own scripts using boto (complains about its docs -- trend here?), someone working on a product to do this (wonder why he sees a business case for this?) . The comments in this thread are evidence enough that there is hardly any consensus on how to do this easily and elegantly. To all those shouting RTFM, you've clearly never read the EC2 docs or tried to use them for this use case. They are hardly adequate, just take a look at their scientific computing page (http://aws.amazon.com/ec2/spot-and-science/) Not a single person here has said something along the lines of "RTFM -- I did and it allowed me to easily do something similar." Just saying RTFM because you can doesn't help, nor does it mean anything if the docs are inadequate for the use case in question.
3. Re:EC2 is scriptable by diab0lic · 2013-11-29 09:28 · Score: 0
  
  Right.
4. Re:EC2 is scriptable by dotancohen · 2013-11-29 11:10 · Score: 4, Insightful
  
  EC2 is inherently scriptable. There's nothing stopping you from using the command-line tools to fire up an instance, and let it run, and store its results to S3, and then decommission the instance.
  You are correct that what you propose is easy and well documented. However, that is not what the OP needs.
  The OP needs lower-priced spot instances, which are intermittently available and designed exactly for this workflow. When the entire AWS datacenter has some spare capacity, these spot instances turn on for those who requested them to run (usually to crunch data that is not time-sensitive). The use and configuration of these instances is not so well documented, probably because you cannot run a webserver on them and that seems to be the focus of much AWS documentation. However, it is exactly these 'spot instances' which are in my opinion the genius of the cloud: they let the heavy, non-time-critical work (i.e. scientific computing) be done when the webservers and mailservers aren't so busy, thus flattening out the daily CPU demand curve.
  The OP should start here:
  http://aws.amazon.com/ec2/spot-tutorials/
  And end here:
  http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/tutorial-spot-adv-java.html
  
  --
  It is dangerous to be right when the government is wrong.
5. Re:EC2 is scriptable by hawguy · 2013-11-29 12:30 · Score: 1
  
  EC2 is inherently scriptable. There's nothing stopping you from using the command-line tools to fire up an instance, and let it run, and store its results to S3, and then decommission the instance.
  You are correct that what you propose is easy and well documented. However, that is not what the OP needs.
  The OP needs lower-priced spot instances, which are intermittently available and designed exactly for this workflow. When the entire AWS datacenter has some spare capacity, these spot instances turn on for those who requested them to run (usually to crunch data that is not time-sensitive). The use and configuration of these instances is not so well documented, probably because you cannot run a webserver on them and that seems to be the focus of much AWS documentation. However, it is exactly these 'spot instances' which are in my opinion the genius of the cloud: they let the heavy, non-time-critical work (i.e. scientific computing) be done when the webservers and mailservers aren't so busy, thus flattening out the daily CPU demand curve
  Why can't you run a webserver on a spot instance? I'm not aware of any restrictions on what you can and cannot run on a spot instance. If the dynamic IP is the problem, then either register the dynamic IP at a dynamic DNS provider, register it as a route-53 IP, or use the EC2 command line tools to attach it to a static Elastic IP address.
  The EC2 API is not complicated (and can run at the command line, and has bindings for common scripting languages), and you can do pretty much anything you want with an instance.
6. Re:EC2 is scriptable by Salis · 2013-11-29 14:15 · Score: 3, Interesting
  
  The OP needs lower-priced spot instances, which are intermittently available and designed exactly for this workflow.
  
  Here's how to utilize lower-priced spot instances for scientific computing:
  1. Set up one long-running, low-cost instance (a small is fine) that creates a distributed queue using Amazon's SQS, and adds jobs to the queue corresponding to each "unit" of the relevant computational problem of interest. New jobs can be added using a command line interface, or through a web interface.
  2. Create a user start-up Bash script for the spot instances that runs your main program -- I prefer using Python and boto for simplicity. The main program should connect to the SQS queue, and begin an "infinite" while loop. Inside the loop, the next job off the queue is pulled, containing the input parameters that define the "unit" of the computational problem of interest. These input parameters are fed to the main algorithm, and the resulting output is uploaded to Amazon S3. The loop continues.
  3. Any time the queue is empty or the spot instance remains idle for ~5 minutes, the spot instance then auto-terminates using EC2's command line interface.
  4. Finally, just write a simple Python script to pull all the results off S3, combine & analyze them, and export to another useful format.
  You'll also need to set up your spot instance price threshold, and make sure the queue has jobs to run. That's it, it's fairly simple.
  
  --
  Favorite /. tagline: "On the eighth day, God created FORTRAN." And it was good.
7. Re:EC2 is scriptable by EETech1 · 2013-11-29 16:16 · Score: 2
  
  Cycle Computing has the Jupiter Job Scheduler that was used in a /. article a couple of weeks ago:
  tech.slashdot.org/story/13/11/13/1754225/121-petaflops-rpeak-supercomputer-created-with-ec2
  Jupiter, or one of their other products may be exactly what you are looking for. It takes care startup and shutdown of the VMs and can even bid on the spot instances for you. IIRC they even had different packages available depending on the number of instances required, and service required.
  Good Luck!
  Cheers:)
8. Re:EC2 is scriptable by dotancohen · 2013-11-29 19:54 · Score: 1
  
  Technically one could run a webserver on a spot instance, but the availability of said server will be inversely proportional to datacenter load instead of proportional to website demand. Do you not see why that is a bad idea?
  
  --
  It is dangerous to be right when the government is wrong.
9. Re:EC2 is scriptable by Anonymous Coward · 2013-11-29 23:39 · Score: 0
  
  RTFM -- I did and it allowed me to easily do something similar.
  Starting from *no* EC2 experience in two weeks I built a scalable OAuth platform which handled the subscription and auth for the play along games for a prime time tv game show. Most of the time was for getting the OAuth code we were using into shape, investigating why Amazon's load balancers weren't handling spikes, setting up database instances, etc..
  The CLI commands wrapped in a bit of shell/Perl will do all you need.
10. Re:EC2 is scriptable by hawguy · 2013-11-30 04:34 · Score: 1
  
  Technically one could run a webserver on a spot instance, but the availability of said server will be inversely proportional to datacenter load instead of proportional to website demand. Do you not see why that is a bad idea?
  Depends on why you want to run the webserver. You can register it to a load balancer after startup. When you run out of spot instances for you web server, then you can start up full paid instances to pick up the slack.
11. Re:EC2 is scriptable by rourin_bushi · 2013-12-01 15:24 · Score: 2
  
  Hate to break the news to ya, but it's not too hard; I set up such a thing in an afternoon to generate traffic to load test an app I am developing. The commandline tools are pretty well documented for this standard workflow.
  I do the first part manually, using the web console: 1)
  1) launch an instance, install your code on it. Bonus points: write a script to parse the UserData so you can tell it where to pull the source data from (I keep such things in S3 if needed)
  2) use that instance to create an AMI.
  3) Use the run-ec2-instances command line tool to launch some instances of your AMI
  4) Either configure the instance to auto-run your tools on boot, or use ssh to remotely launch your script. ssh -i key_pair.pem ec2-user@public-hostname.amazonaws.com '/path/to/my/script.sh'
  5) Make the last command in your script 'sudo shutdown -h now'
  Bam. Automated computing, complete with shutdown. Just make sure it stores your output somewhere. If you really can't figure it out, respond here and I can provide some code snippets later.
12. Re:EC2 is scriptable by rourin_bushi · 2013-12-02 11:38 · Score: 1
  
  For the record, the part of the scripting that I use to handle launching/using/killing instances:
  (admittedly not spot instances, but still might be useful)
  https://github.com/Klaital/toolbox/blob/master/bin/ec2_workers.rb
  I find it a reasonably fun problem to work on, so I'd probably be willing to help more if you need.
What is a 'personal cloud'? by gstoddart · 2013-11-29 07:17 · Score: 1

but I haven't really been able to find any concise information on how to set up a personal cloud
You mean a computer? A server farm? A beowulf cluster?
To me, 'personal cloud' is a totally meaningless term and doesn't correspond to what the cloud is. If it's a couple of servers you own and control, to me that doesn't sound like 'cloud computing' -- it sounds like a marketing term.

--
Lost at C:>. Found at C.
1. Re:What is a 'personal cloud'? by ihtoit · 2013-11-29 07:33 · Score: 1
  
  to me, a cloud on a local level is:
  data cluster
  process cluster (could be the same cluster, or a process cluster with a SAN, storage array or just a honkin' huge hard drive)
  and an interface (could be VMWare or Virtualbox or as simple as Remote Desktop (Windows))
  - which anybody on the sub/network with the correct credentials can access at any time.
  This is different from multiple accounts on a personal computer which is subject to the power state of the system: a cloud's accessibility would by definition be 24/7 with better than 99.9 uptime.
  
  --
  Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
2. Re:What is a 'personal cloud'? by ihtoit · 2013-11-29 07:39 · Score: 1
  
  Actually, this explains it a lot better.
  
  --
  Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
3. Re:What is a 'personal cloud'? by Richard_at_work · 2013-11-29 07:52 · Score: 2
  
  You certainly can have a personal cloud, or an internal cloud, or a private cloud.
  The term cloud is one of those that people seem to go out of their way on Slashdot in order to misconstrue or misunderstand, when in fact its simple - its a resource that you want to do X but you don't necessarily want to know the indepth details of how it goes about it. I want a website hosted, I want it redundant and I want it scalable, but I dont necessarily want to give a toss about manually balancing resources across several servers. Creating a single resource out of all of my local servers and letting the management software work out the details of where and how my redundant, scalable website is spread across those individual servers...
  Old timers might call it something else, or people coming from a particular industry might call it something else, but for the rest of us its just the cloud. Its separating the physical resource from the task you need to do, hiding the complexity of the underlying resource provision so you just get the task done. In an IT department for a large company, the hardware department might create a private cloud in order to remove the task of managing the hardware from the various departments that might want to utilise them, so they just allow the web department to deploy their sites, and the analytics department to run their tasks etc etc all without worrying about hardware failing or getting gummed up by a single process, because the underlying management later spreads the load across multiple physical resources automatically.
SC13 by jsimon12 · 2013-11-29 07:17 · Score: 1, Informative

Bunch of papers at SC13 presented this year. Suggest sunny look them up.
http://sc13.supercomputing.org/content/papers
Symantec Workflow by onyxruby · 2013-11-29 07:18 · Score: 2

Does exactly what you need and is designed explicitly for integration with third party tools. Spins up everything from disks to automating webforms and jobs and imports and exports of jobs. There really isn't anything else out there that comes close to what Workflow will do. Used to be called Altiris Workflow. Works with everything from CMDB, change management, service desk to multiple languages.
http://www.symantec.com/connect/articles/learn-about-symantec-workflow
1. Re:Symantec Workflow by afidel · 2013-11-29 07:23 · Score: 1
  
  And since it's from Symantec expect exactly zero support beyond "is it plugged in" level scriptbots. We dumped Netbackup after over a decade of use due to the fact that even with a $200k purchase on the line and a regional VP involved we couldn't get effective support.
  
  --
  There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
2. Re:Symantec Workflow by Anonymous Coward · 2013-11-29 14:18 · Score: 0
  
  Hate to say it but you might be right. I had a horrible experience trying to get Ghost support. We clearly defined the problem, gave them everything they asked for, and Symantec never produced even one suggestion or fix. Might as well have been talking to a scripting AI.
  "Ghost needs X". OK, how can I measure X?
  "Ghost needs X". Can you give me a command to discover our X implementation?
  "Ghost needs X". Are other customers having difficulty with X?
  "Ghost needs X". Blue grapes sneeze violently in school. Hello? Is there anyone in there? Hello?
  Time was I had a pretty good opinion of Symantec. That was a long time ago now.
This is precisely what the AWS API is for. by gdek · 2013-11-29 07:20 · Score: 5, Informative

Because your workflow is likely to be customized to your tasks, it should be straightforward to write these kinds of tools yourself, with any number of available toolkits, based on what language you're most comfortable using.
There's the straight CLI: http://aws.amazon.com/cli/
And lots of sample code for the various SDKs: http://aws.amazon.com/code
Best to just dive in. If you have any development experience at all, even just scripting, you should be able to figure it out pretty quickly.
1. Re:This is precisely what the AWS API is for. by multimediavt · 2013-11-29 07:23 · Score: 2
  
  Wish I had mod points. Was going to say, "Have you looked at what Amazon has available?" I know computational chemists that designed workflows like what the OP is talking about, so I know it is well documented somewhere. [cough, cough]
2. Re:This is precisely what the AWS API is for. by TubeSteak · 2013-11-29 08:01 · Score: 1
  
  https://aws.amazon.com/datapipeline/
  https://aws.amazon.com/swf/
  Between these two sets of functionality, I think the submitter should be able to full automate his workflow.
  If you want someone to actually set it up for you... that's what starving grad students are for
  
  --
  [Fuck Beta]
  o0t!
3. Re:This is precisely what the AWS API is for. by who_stole_my_kidneys · 2013-11-29 11:01 · Score: 1
  
  This post should have been put into the "Let me Google that for you" section, but gdek did that already :-P
Boto for Python is a good option by rovitotv · 2013-11-29 07:25 · Score: 2

Since my scientific workflow always includes Python it is natural for me to use boto.
https://github.com/boto/boto
http://boto.readthedocs.org/en/latest/
http://aws.amazon.com/sdkforpython/
Home user/power user by ihtoit · 2013-11-29 07:27 · Score: 0

One word: Virtualbox.
Of course, it helps to read a primer or two first. I learned how to build a cluster before I learned about machine images. Once you have the two married, you're laughing.
I've been messing with clusters now for nearly ten years, and Virtualbox for about 3 or 4. It's fun.

--
Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
1. Re:Home user/power user by Anonymous Coward · 2013-11-29 08:10 · Score: 0, Funny
  
  YOu're kinda clueless aren't you? He wants to parrelize his fucking calculations not shake and bake his house (costs more power to run anything close to ec2 at home then it does to rent).
2. Re:Home user/power user by diab0lic · 2013-11-29 09:07 · Score: 1
  
  Virtualbox will not in any way help me. I don't own, and don't want to purchase or manage the hardware myself -- time tends to be short for researchers and an automated, easy, pay per use solution is very ideal.
GlideinWMS by mrcheesyfart · 2013-11-29 07:29 · Score: 1

You could use GlideinWMS, which was made to manage a pool of dynamic grid resources for scientific computing, such as the Open Science Grid. It can also manage personal Condor pools too. I believe it can also connect to Amazon EC2, but I don't see a lot of information on their web-page about that. You may have to contact them for more information, but I know that the team is very responsive and interested in finding more scientific users. You can find more information here: http://www.uscms.org/SoftwareComputing/Grid/WMS/glideinWMS/doc.prd/index.html
Read the manual maybe? by Morpf · 2013-11-29 07:29 · Score: 1

To the OP: Please refer to the provided documentation or use a search engine to find tutorials, if you dare. There is an official API for this. We won't recite manuals here.
To ./ community: Why is a question that can be answered with a "rtfm" landing on the front page?
1. Re:Read the manual maybe? by Anonymous Coward · 2013-11-29 19:32 · Score: 0
  
  What is a dot slash community?
On the simple side of things by Anonymous Coward · 2013-11-29 07:31 · Score: 0

You could just run "sudo shutdown -h now" in the instance as part of a script that launches the experiment. It would give you a simple way of turning off an instance and stop most of the charges related to it.
1. Re: On the simple side of things by Anonymous Coward · 2013-11-29 08:45 · Score: 0
  
  Why not just have the instance delete itself when it is complete?
Jenkins by Cili · 2013-11-29 07:31 · Score: 2

Jenkins would probably be useful in this case, with this plugin:
https://wiki.jenkins-ci.org/display/JENKINS/Amazon+EC2+Plugin
boto+fabric+eucalyptus=automated personal cloud by Anonymous Coward · 2013-11-29 07:32 · Score: 1

You can create your own personal cloud, call it private cloud, and then automate all your tasks. I have been doing the same, I utilised fabric (for automation), boto (euca2ools) for controlling the cloud (creating instances, volumes, etc). Eucalyptus helps you create your own private cloud, you will have your IaaS implementation easy. OpenStack has a growing following, you may prefer to adopt it than Eucalyptus. There are lots of other available tools however.
StarCluster by Anonymous Coward · 2013-11-29 07:35 · Score: 0

http://star.mit.edu/cluster/
Re:This is a paid slashvertisement for Amazon by Jailbrekr · 2013-11-29 07:37 · Score: 1

I agree. This problem is easily scriptable using python so I'm honestly surprised a legitimate researcher is asking slashdot instead of jumping into a writing a python script.

--
Feed the need: Digitaladdiction.net
AWS API by Anonymous Coward · 2013-11-29 07:40 · Score: 0

I use the AWS API.
If you are not able to RTFM and do it, then look to cut a check to someone that can.
Cloudformation by Fubar420 · 2013-11-29 07:53 · Score: 2

Amazons http://aws.amazon.com/cloudformation/ can get you 95% of the way there (add a few small scripts via Boto, or some integration with http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-cfn-customresource.html)
A little elbow grease will get you the rest of the way without additional costs.

--
-- (appended to the end of comments you post, 120 chars)
Powershell by LordLimecat · 2013-11-29 07:57 · Score: 0

This is sort of a scripting issue, and Powershell has modules for everything under the sun-- including Amazon:
http://aws.amazon.com/powershell/
Not sure whether your instances themselves are running windows, but if so that would be even easier to integrate.
shell by Anonymous Coward · 2013-11-29 07:58 · Score: 0

$ ./experiment && ./mail_results.sh && halt
XSEDE or a university HPC site by Anonymous Coward · 2013-11-29 08:01 · Score: 0

Don't bother doing it yourself. Get a grant of cputime through XSEDE. Alternatively talk to your local university about using their HPC resources. If they don't have any, a state-run university may offer you the ability to use their resources if you attend another university in the state. If you're doing commercial research, you may still be in luck since many states offer commercial HPC resources.
Really, don't bother doing it yourself using virtualized resources. Pay someone to do it right and use a real batch job scheduler. The fact that you don't know how to automatically shut down a VM when your job ends means you should pay someone else to manage it for you. And like I said earlier, there are plenty of free resources our there to use.
Re:This is a paid slashvertisement for Amazon by CodeReign · 2013-11-29 08:04 · Score: 1

every time I've looked into scripting my manual tasks with AWS I've found their documentation to overwhelming and not concise or clear.
RTFD by Neuroelectronic · 2013-11-29 08:07 · Score: 1

Have you tried Google or the AWS documentation? What you are asking for is the bare-bones most basic use case. They even have services setup to make this kind of thing easier, like the Simple Workflow Service, Messaging Service and Simple Que Service.
high-level introduction to workflow service:
http://docs.aws.amazon.com/amazonswf/latest/developerguide/swf-dg-intro-to-swf.html
recipes using workflow service:
http://aws.amazon.com/code/2535278400103493
1. Re:RTFD by diab0lic · 2013-11-29 09:35 · Score: 1
  
  SWF Appears to be much more interested in helping me manage clusters of instances than about streamlining the lifecycle of a single customized spot instance from inception to termination.
One AMI image, ec2, and shell scripts by Lally+Singh · 2013-11-29 08:12 · Score: 3, Informative

Here's how I ran my PhD simulations on EC2:
- The AMI downloads a manifest file at startup.
- The manifest has one record per line, two fields per record: the s3 URL of a .tar.gz to download, and the path to download it
- The AMI then runs a shell script (/etc/run.sh) that's been put there by a manifest entry
Shell scripts upload new files to s3 (e.g., /etc/run.sh) and have ec2 run new VMs. When the VMs are loaded, they're running everything I need, ready to go.
Other shell scripts stopped/started experiments on these VMs.
Other shell scripts shut down the VMs when I'm done.
The scripts did little more than scan the appropriate machine list from the ec2 tools and ssh into them with a specific command.
At the end, I had some of the experiment-specific scripts quickly have git clone/pull in files I was changing quickly per experiment.
All of it worked really well for me. Nothing fancier than the ec2 command-line tools, bash, ssh, & git necessary.

--
Care about electronic freedom? Consider donating to the EFF!
Starcluster by Anonymous Coward · 2013-11-29 08:21 · Score: 1

I have used MIT's starcluster In the past for something very similar to this workflow. It provides a very user friendly interface for EC2 spot interfaces for almost the exact workflow you're looking for. They provide AMI's you can customize and a relatively well documented set of commands to easily launch spot instances.
Condor and other on EC2: PiCloud, by james.paul.white · 2013-11-29 08:24 · Score: 1

Docker looks promising, but there are other existing services stacked on EC2 that address the needs of science workloads. PiCloud does exactly the things you're asking for: http://www.picloud.com/platform/ . And the folks at Cycle Computing use Condor to manage the largest jobs ever run on EC2: http://www.cyclecomputing.com/ . I'm still working on my own stuff based on Groovy and Condor which I call Gondor, but it isn't at all ready for others to use. One thing I have found to be great is that there is a MacPorts portfile for Condor which works dandy. Just "sudo port install htcondor && sudo port load htcondor". http://research.cs.wisc.edu/htcondor/HTCondorWeek2013/presentations/SingerL_MacPorts.pdf . I don't yet see a nice single workflow that gets us to an integrated reproducible published result at the other end like Elsevier's Executable Paper http://www.elsevier.com/physical-sciences/computer-science/executable-papers, but I think we'll be there soon.
Focus on your work by Anonymous Coward · 2013-11-29 08:32 · Score: 0

If you want to play reinventing the wheel good luck with that.
If you want your research done go to my.cyclecloud.com and forget about it.
Re:This is a paid slashvertisement for Amazon by Anonymous Coward · 2013-11-29 08:36 · Score: 0

The question is PR for Amazon, the answer is PR for Netflix :-)
For pretty much everything he asked for, there's a Netflix open source project that does exactly what he wants. Docker-like project to create AMIs? Aminator. Tool to manage deployments? Asgard. Plus plenty more that solve issues that he either doesn't have yet or doesn't realize he has.
Well, we can do this by Cyberax · 2013-11-29 08:39 · Score: 1

We have a product in development that does just this - it can spin up spot nodes with the best price/performance ratio, dispatch tasks and restart them if a spot node fails. With lots of other goodies.

Drop me a note if you're interested: alex.besogonov@gmail.com
1. Re:Well, we can do this by Anonymous Coward · 2013-11-29 09:11 · Score: 0
  
  Whatever he quotes, I'll do it for half the price!
really ?? by joss · 2013-11-29 08:41 · Score: 1

putting aside my slashvert suspicions of the post, (hard to see how you could have chose AWS at all and be so clueless )
I've done this kind of thing a lot. Here's my approach
1. Fire up an EBS backed AMI from an existing stock version of your favorite OS ( ubuntu 12.04 for me just cos i use it on desktop and can't be bothered with differences)
2. customize it with your own shit
3. include in the /etc/rc.local a script to customize things further.. and because you don't want to faff about changing the AMI every time you change shit, have the startup script pull the latest stuff you need straight out of your source code repository and then run further initialisation stuff
4. make an image from that instance (easily done from AWS control panel)
5. learn how to use boto (python AWS api) to fire up instances, attach storeage, shutdown instances etc. Using the command line tools is fine for the simplest stuff but as soon as stuff gets a little harder you really want to use a programming language, so unless you're extremely fond of java python is best fit for this)
The boto documentation is kinda shit, so every time you need to do something just google for an example doing something similiar .. the official api documentation is last resort reference only.

--
http://rareformnewmedia.com/
StarCluster by Anonymous Coward · 2013-11-29 09:04 · Score: 0

I am a sysadmin at a university HPC facility. You should look at StarCluster http://star.mit.edu/cluster/
"StarCluster is an open source cluster-computing toolkit for Amazon’s Elastic Compute Cloud (EC2). StarCluster has been designed to simplify the process of building, configuring, and managing clusters of virtual machines on Amazon’s EC2 cloud."
You get a typical compute cluster with a job queue etc, but everything runs on EC2. I believe it has all the features you are looking for, including being able to only run jobs on spot instances.
I have never tried StarCluster, as we actually have our own physical clusters here. Only obvious concern with StarCluster is that they use OGE (Oracle Grid Engine, formerly SGE, Sun Grid Engine) as the resource manager. Probably works fine, but not what I would have used.
Re:This is a paid slashvertisement for Amazon by diab0lic · 2013-11-29 09:08 · Score: 1

This is more or less exactly the problem, their spot instances for science page is a friggin joke.[0] Their API seems reasonable for spinning up instances and I am now looking at writing some scripts to do this, however their docs avoid ever telling you that you can run scripts in the "user data" field when starting an instance... kind of a major hurdle that the command line tools don't make clear. I've actually got something going now with the CLI tools + docker that makes getting an environment running pretty simple. I'm going to formalize it and post it online in the near future. [0] http://aws.amazon.com/ec2/spot-and-science/
Re:This is a paid slashvertisement for Amazon by diab0lic · 2013-11-29 09:26 · Score: 1

Thanks for the links, aminator looks to be perfect for easily crafting job specific environments -- I'll probably include this in whatever solution I come up with. Asgard on the other hand, and correct me if I'm wrong, looks to be much more oriented to those who have a lot of things running for an indefinite time frame in the cloud. Thanks!
Starcluster does 80-90% of what you want by austingeekgirl · 2013-11-29 10:05 · Score: 3, Informative

http://star.mit.edu/cluster/
The rest of it is easily scriptable. I have some ebs based AMIs that on bootup, connects to a central server,
registers itself (ticks up a text file, and adds itself to /etc/hosts).
If you combine starcluster for generic cluster management with the existing Amazon provided tools
http://blog.roozbehk.com/post/35277172460/installing-amazon-ec2-tools)
this is really only a days worth of scripting and testing.
There are also several public AMIs on Ec2 that are oriented towards scientific computing.
http://www.google.com/search?q=ec2%20ami%20scientific
This is my day job stuff.
1. Re:Starcluster does 80-90% of what you want by john.c.earls · 2013-11-29 12:09 · Score: 2
  
  Second on Starcluster. Very easy to get up and running quickly. It is well documented with a good plugin system. If you do scientific computing, then you are probably familiar with most of the tools that are built in; SGE, ipython, nfs, etc. Aside from the provided Amazon tools, I find the boto(python) library to be helpful if I need to interact with s3 or sqs.
2. Re:Starcluster does 80-90% of what you want by Anonymous Coward · 2013-11-29 16:57 · Score: 0
  
  Bingo - and you can script it for the other 10%
3. Re:Starcluster does 80-90% of what you want by james.paul.white · 2013-11-30 19:18 · Score: 1
  
  One of the StarCluster plug-ins is for Condor which is supported in their latest AMIs. Perfect for me.
PiCloud (Multyvac) by Anonymous Coward · 2013-11-29 10:25 · Score: 0

PiCloud http://www.picloud.com (soon to be Multyvac http://www.multyvac.com) automates all this for you in a clean and (very) simple way.
Domino by Anonymous Coward · 2013-11-29 10:25 · Score: 0

I've been building a service called Domino (http://www.dominoUp.com) that might be useful to you. Domino makes it super easy to run your scientific computing on EC2 by taking care of precisely the type of headache bookkeeping-type automation you're running into. Furthermore, we also help you keep your experiments organized by revisioning your code and data along with your results - making it really easy to reproduce past results. Finally, we make it easy to collaborate with others by simply sharing your Domino proejct.
We're still in beta, but I hope you (and any others with similar problems) will check it out and let us know if you find it useful: http://www.dominoUp.com
As the top comments point out, nothing Domino is doing is impossible for you to replicate by scripting everything yourself. But we thought we'd save you the trouble :)
CycleCloud by sir+smidgen · 2013-11-29 11:48 · Score: 1

Check out Cycle Computing's CycleCloud product: http://www.cyclecomputing.com/wiki/index.php?title=CycleCloud They offer meta-scheduling products specifically for managing HTCondor pools in AWS. The Cycle team works closely with the HTCondor team and supports loads of scientific projects. Their products have historically been free for academic use.
Ansible by Anonymous Coward · 2013-11-29 11:55 · Score: 0

I would use ansible as the glue to orchestrate this, it can make ec2, do docker and most other things that have a python library. It can be called from many things , in my use cases build/ ci tools like Jenkins or bamboo. You can call it from whatever monitors your jobs or the job itself on completion. It's a better option that writing it all yourself in python.
Pubflow by Anonymous Coward · 2013-11-29 12:04 · Score: 0

Maybe they can help you: http://www.pubflow.uni-kiel.de/en
Use AWS console by Anonymous Coward · 2013-11-29 13:02 · Score: 0

If the CLI is overwhelming you, I suggest seeing how far you can get with the console - and then automating bits as you get more comfortable. You can easily point-and-click your way to a running instance, customize it, and bundle it as an AMI. Google how to terminate a running instance and put that at the end of the script you are running on it. Next time instead of launching it directly, point-and-click to create a spot instance request which runs that AMI. Once you've done that, learn how to read launch parameters at the beginning of your script. You may find it less intimidating if you start with the manual method and solve one piece at a time.
Automation by EdMcMan · 2013-11-29 13:47 · Score: 1

As others have pointed out, deploying EC2 instances automatically is fairly easy using the well-documented EC2 APIs.
The difficult part about distributed computing is synchronizing the work between available instances. For this, you might want to look at RabbitMQ or other queueing servers. One way to do this would be to have one thread (on your computer) generating problem instances, while you spawn spot instances on EC2 as desired, which consume the work and report the results. I suspect you could accomplish something similar using Hadoop/MapReduce.
1. Re:Automation by Anonymous Coward · 2013-11-29 14:25 · Score: 0
  
  Or just use Amazon SQS which is designed for exactly this.
Manta by Marsell · 2013-11-29 14:45 · Score: 1

If you're willing to look beyond AWS, there's something called Manta out there (http://www.joyent.com/products/manta). The data rests on some servers, and you submit UNIX map/reduce jobs. The jobs are run on the nodes where the data is resting, you get a full UNIX environment, and you only get charged as you'd expect (compute time, combined with the cheaper at-rest time). It might be a better fit for what you're doing than your proposal, plus it'll likely be faster too due to reduced data movement.
try the opensource cloudify by SpaceCracker · 2013-11-29 17:18 · Score: 2

Look up cloudify on cloudifysource.org.
It enables spinning up machines on the cloud of your choice (including EC2). Then it installs and configures your software on those VMs. Finally it monitors all processes that you request it to monitor, including listening to exposed custom metrics, e.g. over a jmx port.
In your case, when your experiment ends, if your software exposes some api or metric that can indicate that, cloudify can take that as a trigger for shutting down or spinning up the next experiment.
A nice bonus is that it can elastically scale in and out your VMs to handel varying loads and automatically restart problematic VMs or processes.

--
sigo ergo sum
DIY by Anonymous Coward · 2013-11-29 17:53 · Score: 0

Your code must be so bad that you're probably trying to kill a fly with a bazooka
Jenkins by Anonymous Coward · 2013-11-29 21:59 · Score: 0

we have a Jenkins script for that
awstool by grewil · 2013-11-29 23:35 · Score: 1

I strongly recommend this command line tool. With this, you can do all those operations and more, and in a sensible and uncluttered fashion:
http://www.timkay.com/aws/
Perl module link by Anonymous Coward · 2013-11-30 03:27 · Score: 0

Here's a link to a module listed on cpan:
http://search.cpan.org/~mallen/Net-Amazon-EC2-0.23/lib/Net/Amazon/EC2.pm
commercial services? by Anonymous Coward · 2013-12-03 06:41 · Score: 0

related, anyone know of good commercial services? greenbutton.com is one that i know of. brightcomputing.com and cyclecomputing.com also.