Slashdot Mirror


Ask Slashdot: Scientific Computing Workflow For the Cloud?

diab0lic writes "I have recently come into the situation where I need to run cloud computing on demand for my research. Amazon's EC2 Spot Instances are an ideal platform for this as I can requisition an appropriate instance for the given experiment {high cpu, high memory, GPU instance} depending on its needs. However I currently spin up the instance manually, set it up, run the experiment, and then terminate manually. This gets tedious monitoring experiments for completion, and I incur unnecessary costs if a job finishes while I'm sleeping, for example. The whole thing really should be automated. I'm looking for a workflow somewhat similar to this:
  1. Manually create Amazon machine image (AMI) for experiment.
  2. Issue command to start AMI on specified spot instance type.
  3. Automatically connect EBS to instance for result storage.
  4. Automatically run specified experiment, bonus if this can be parameterized.
  5. Have AMI automatically terminate itself upon experiment completion.

Something like docker that spun up on-demand spot instances of a specified type for each run and terminated said instance at run completion would be absolutely perfect. I also know HTCondor can back onto EC2 spot instances but I haven't really been able to find any concise information on how to set up a personal cloud — I also think this is slight overkill. Do any other Slashdot users have similar problems? How did you solve it? What is your workflow? Thanks!"

18 of 80 comments (clear)

  1. EC2 is scriptable by Anonymous Coward · · Score: 5, Informative

    EC2 is inherently scriptable. There's nothing stopping you from using the command-line tools to fire up an instance, and let it run, and store its results to S3, and then decommission the instance. You can even set the instances to terminate on shutdown, which deletes the instance EBS stores (if you're using EBS) and deletes the instance. Sounds like you just need to spend 30 minutes reading the docs.

    1. Re:EC2 is scriptable by Charliemopps · · Score: 2

      I was about to say, if you can't figure this out, no wonder you need a super computer to run your code. :-D

    2. Re:EC2 is scriptable by diab0lic · · Score: 5, Insightful

      I'm aware that EC2 is inherently scriptable, though the documentation is incredibly poor for some areas, and heavily favours those interested in long running instances. This post is about asking others what their workflow for short term spot instances is, and generating some collaboration and sharing of ideas on the subject. Looking through the other comments there is a PhD who wrote some of his own scripts using boto (complains about its docs -- trend here?), someone working on a product to do this (wonder why he sees a business case for this?) . The comments in this thread are evidence enough that there is hardly any consensus on how to do this easily and elegantly. To all those shouting RTFM, you've clearly never read the EC2 docs or tried to use them for this use case. They are hardly adequate, just take a look at their scientific computing page (http://aws.amazon.com/ec2/spot-and-science/) Not a single person here has said something along the lines of "RTFM -- I did and it allowed me to easily do something similar." Just saying RTFM because you can doesn't help, nor does it mean anything if the docs are inadequate for the use case in question.

    3. Re:EC2 is scriptable by dotancohen · · Score: 4, Insightful

      EC2 is inherently scriptable. There's nothing stopping you from using the command-line tools to fire up an instance, and let it run, and store its results to S3, and then decommission the instance.

      You are correct that what you propose is easy and well documented. However, that is not what the OP needs.

      The OP needs lower-priced spot instances, which are intermittently available and designed exactly for this workflow. When the entire AWS datacenter has some spare capacity, these spot instances turn on for those who requested them to run (usually to crunch data that is not time-sensitive). The use and configuration of these instances is not so well documented, probably because you cannot run a webserver on them and that seems to be the focus of much AWS documentation. However, it is exactly these 'spot instances' which are in my opinion the genius of the cloud: they let the heavy, non-time-critical work (i.e. scientific computing) be done when the webservers and mailservers aren't so busy, thus flattening out the daily CPU demand curve.

      The OP should start here:
      http://aws.amazon.com/ec2/spot-tutorials/

      And end here:
      http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/tutorial-spot-adv-java.html

      --
      It is dangerous to be right when the government is wrong.
    4. Re:EC2 is scriptable by Salis · · Score: 3, Interesting

      The OP needs lower-priced spot instances, which are intermittently available and designed exactly for this workflow.

      Here's how to utilize lower-priced spot instances for scientific computing:

      1. Set up one long-running, low-cost instance (a small is fine) that creates a distributed queue using Amazon's SQS, and adds jobs to the queue corresponding to each "unit" of the relevant computational problem of interest. New jobs can be added using a command line interface, or through a web interface.

      2. Create a user start-up Bash script for the spot instances that runs your main program -- I prefer using Python and boto for simplicity. The main program should connect to the SQS queue, and begin an "infinite" while loop. Inside the loop, the next job off the queue is pulled, containing the input parameters that define the "unit" of the computational problem of interest. These input parameters are fed to the main algorithm, and the resulting output is uploaded to Amazon S3. The loop continues.

      3. Any time the queue is empty or the spot instance remains idle for ~5 minutes, the spot instance then auto-terminates using EC2's command line interface.

      4. Finally, just write a simple Python script to pull all the results off S3, combine & analyze them, and export to another useful format.

      You'll also need to set up your spot instance price threshold, and make sure the queue has jobs to run. That's it, it's fairly simple.

      --
      Favorite /. tagline: "On the eighth day, God created FORTRAN." And it was good.
    5. Re:EC2 is scriptable by EETech1 · · Score: 2

      Cycle Computing has the Jupiter Job Scheduler that was used in a /. article a couple of weeks ago:

      tech.slashdot.org/story/13/11/13/1754225/121-petaflops-rpeak-supercomputer-created-with-ec2

      Jupiter, or one of their other products may be exactly what you are looking for. It takes care startup and shutdown of the VMs and can even bid on the spot instances for you. IIRC they even had different packages available depending on the number of instances required, and service required.

      Good Luck!
      Cheers:)

    6. Re:EC2 is scriptable by rourin_bushi · · Score: 2

      Hate to break the news to ya, but it's not too hard; I set up such a thing in an afternoon to generate traffic to load test an app I am developing. The commandline tools are pretty well documented for this standard workflow.

      I do the first part manually, using the web console: 1)
      1) launch an instance, install your code on it. Bonus points: write a script to parse the UserData so you can tell it where to pull the source data from (I keep such things in S3 if needed)
      2) use that instance to create an AMI.
      3) Use the run-ec2-instances command line tool to launch some instances of your AMI
      4) Either configure the instance to auto-run your tools on boot, or use ssh to remotely launch your script. ssh -i key_pair.pem ec2-user@public-hostname.amazonaws.com '/path/to/my/script.sh'
      5) Make the last command in your script 'sudo shutdown -h now'

      Bam. Automated computing, complete with shutdown. Just make sure it stores your output somewhere. If you really can't figure it out, respond here and I can provide some code snippets later.

  2. Symantec Workflow by onyxruby · · Score: 2

    Does exactly what you need and is designed explicitly for integration with third party tools. Spins up everything from disks to automating webforms and jobs and imports and exports of jobs. There really isn't anything else out there that comes close to what Workflow will do. Used to be called Altiris Workflow. Works with everything from CMDB, change management, service desk to multiple languages.

    http://www.symantec.com/connect/articles/learn-about-symantec-workflow

  3. This is precisely what the AWS API is for. by gdek · · Score: 5, Informative

    Because your workflow is likely to be customized to your tasks, it should be straightforward to write these kinds of tools yourself, with any number of available toolkits, based on what language you're most comfortable using.

    There's the straight CLI: http://aws.amazon.com/cli/

    And lots of sample code for the various SDKs: http://aws.amazon.com/code

    Best to just dive in. If you have any development experience at all, even just scripting, you should be able to figure it out pretty quickly.

    1. Re:This is precisely what the AWS API is for. by multimediavt · · Score: 2

      Wish I had mod points. Was going to say, "Have you looked at what Amazon has available?" I know computational chemists that designed workflows like what the OP is talking about, so I know it is well documented somewhere. [cough, cough]

  4. Boto for Python is a good option by rovitotv · · Score: 2

    Since my scientific workflow always includes Python it is natural for me to use boto.

    https://github.com/boto/boto
    http://boto.readthedocs.org/en/latest/
    http://aws.amazon.com/sdkforpython/

  5. Jenkins by Cili · · Score: 2

    Jenkins would probably be useful in this case, with this plugin:

    https://wiki.jenkins-ci.org/display/JENKINS/Amazon+EC2+Plugin

  6. Re:What is a 'personal cloud'? by Richard_at_work · · Score: 2

    You certainly can have a personal cloud, or an internal cloud, or a private cloud.

    The term cloud is one of those that people seem to go out of their way on Slashdot in order to misconstrue or misunderstand, when in fact its simple - its a resource that you want to do X but you don't necessarily want to know the indepth details of how it goes about it. I want a website hosted, I want it redundant and I want it scalable, but I dont necessarily want to give a toss about manually balancing resources across several servers. Creating a single resource out of all of my local servers and letting the management software work out the details of where and how my redundant, scalable website is spread across those individual servers...

    Old timers might call it something else, or people coming from a particular industry might call it something else, but for the rest of us its just the cloud. Its separating the physical resource from the task you need to do, hiding the complexity of the underlying resource provision so you just get the task done. In an IT department for a large company, the hardware department might create a private cloud in order to remove the task of managing the hardware from the various departments that might want to utilise them, so they just allow the web department to deploy their sites, and the analytics department to run their tasks etc etc all without worrying about hardware failing or getting gummed up by a single process, because the underlying management later spreads the load across multiple physical resources automatically.

  7. Cloudformation by Fubar420 · · Score: 2

    Amazons http://aws.amazon.com/cloudformation/ can get you 95% of the way there (add a few small scripts via Boto, or some integration with http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-cfn-customresource.html)

    A little elbow grease will get you the rest of the way without additional costs.

    --
    -- (appended to the end of comments you post, 120 chars)
  8. One AMI image, ec2, and shell scripts by Lally+Singh · · Score: 3, Informative

    Here's how I ran my PhD simulations on EC2:
    - The AMI downloads a manifest file at startup.
        - The manifest has one record per line, two fields per record: the s3 URL of a .tar.gz to download, and the path to download it
    - The AMI then runs a shell script (/etc/run.sh) that's been put there by a manifest entry

    Shell scripts upload new files to s3 (e.g., /etc/run.sh) and have ec2 run new VMs. When the VMs are loaded, they're running everything I need, ready to go.
    Other shell scripts stopped/started experiments on these VMs.
    Other shell scripts shut down the VMs when I'm done.
    The scripts did little more than scan the appropriate machine list from the ec2 tools and ssh into them with a specific command.

    At the end, I had some of the experiment-specific scripts quickly have git clone/pull in files I was changing quickly per experiment.

    All of it worked really well for me. Nothing fancier than the ec2 command-line tools, bash, ssh, & git necessary.

    --
    Care about electronic freedom? Consider donating to the EFF!
  9. Starcluster does 80-90% of what you want by austingeekgirl · · Score: 3, Informative

    http://star.mit.edu/cluster/

    The rest of it is easily scriptable. I have some ebs based AMIs that on bootup, connects to a central server,
    registers itself (ticks up a text file, and adds itself to /etc/hosts).

    If you combine starcluster for generic cluster management with the existing Amazon provided tools
    http://blog.roozbehk.com/post/35277172460/installing-amazon-ec2-tools)
    this is really only a days worth of scripting and testing.

    There are also several public AMIs on Ec2 that are oriented towards scientific computing.
    http://www.google.com/search?q=ec2%20ami%20scientific

    This is my day job stuff.

    1. Re:Starcluster does 80-90% of what you want by john.c.earls · · Score: 2

      Second on Starcluster. Very easy to get up and running quickly. It is well documented with a good plugin system. If you do scientific computing, then you are probably familiar with most of the tools that are built in; SGE, ipython, nfs, etc. Aside from the provided Amazon tools, I find the boto(python) library to be helpful if I need to interact with s3 or sqs.

  10. try the opensource cloudify by SpaceCracker · · Score: 2

    Look up cloudify on cloudifysource.org.

    It enables spinning up machines on the cloud of your choice (including EC2). Then it installs and configures your software on those VMs. Finally it monitors all processes that you request it to monitor, including listening to exposed custom metrics, e.g. over a jmx port.

    In your case, when your experiment ends, if your software exposes some api or metric that can indicate that, cloudify can take that as a trigger for shutting down or spinning up the next experiment.

    A nice bonus is that it can elastically scale in and out your VMs to handel varying loads and automatically restart problematic VMs or processes.

    --
    sigo ergo sum