Slashdot Mirror


Ask Slashdot: Scientific Computing Workflow For the Cloud?

diab0lic writes "I have recently come into the situation where I need to run cloud computing on demand for my research. Amazon's EC2 Spot Instances are an ideal platform for this as I can requisition an appropriate instance for the given experiment {high cpu, high memory, GPU instance} depending on its needs. However I currently spin up the instance manually, set it up, run the experiment, and then terminate manually. This gets tedious monitoring experiments for completion, and I incur unnecessary costs if a job finishes while I'm sleeping, for example. The whole thing really should be automated. I'm looking for a workflow somewhat similar to this:
  1. Manually create Amazon machine image (AMI) for experiment.
  2. Issue command to start AMI on specified spot instance type.
  3. Automatically connect EBS to instance for result storage.
  4. Automatically run specified experiment, bonus if this can be parameterized.
  5. Have AMI automatically terminate itself upon experiment completion.

Something like docker that spun up on-demand spot instances of a specified type for each run and terminated said instance at run completion would be absolutely perfect. I also know HTCondor can back onto EC2 spot instances but I haven't really been able to find any concise information on how to set up a personal cloud — I also think this is slight overkill. Do any other Slashdot users have similar problems? How did you solve it? What is your workflow? Thanks!"

1 of 80 comments (clear)

  1. Re:EC2 is scriptable by Salis · · Score: 3, Interesting

    The OP needs lower-priced spot instances, which are intermittently available and designed exactly for this workflow.

    Here's how to utilize lower-priced spot instances for scientific computing:

    1. Set up one long-running, low-cost instance (a small is fine) that creates a distributed queue using Amazon's SQS, and adds jobs to the queue corresponding to each "unit" of the relevant computational problem of interest. New jobs can be added using a command line interface, or through a web interface.

    2. Create a user start-up Bash script for the spot instances that runs your main program -- I prefer using Python and boto for simplicity. The main program should connect to the SQS queue, and begin an "infinite" while loop. Inside the loop, the next job off the queue is pulled, containing the input parameters that define the "unit" of the computational problem of interest. These input parameters are fed to the main algorithm, and the resulting output is uploaded to Amazon S3. The loop continues.

    3. Any time the queue is empty or the spot instance remains idle for ~5 minutes, the spot instance then auto-terminates using EC2's command line interface.

    4. Finally, just write a simple Python script to pull all the results off S3, combine & analyze them, and export to another useful format.

    You'll also need to set up your spot instance price threshold, and make sure the queue has jobs to run. That's it, it's fairly simple.

    --
    Favorite /. tagline: "On the eighth day, God created FORTRAN." And it was good.