Slashdot Mirror


Open Source Batch Management?

Asgard asks: "My employer is currently running a commercial batch management platform. Unfortunately the licensing model makes it unfeasible to run it in the development / testing environments, leading to poor usage of the tool and unexpected failures in production. I'm looking for an equivalent Open Source tool and am wondering how others have approached the problem. Does Slashdot have any suggestions?" Imagine a system like cron, but with job dependencies. Are there any batch systems out there like this? "The tools I've found through web searches mostly treat 'batch management' from the cluster perspective -- a user submits an ad-hoc job and the tool figures out where and when to run it based on load and architecture requirements. Instead I am looking for something that manages daily schedules of jobs based on their dependencies with other jobs and external events, such as files arriving or time.

An example might be that every day jobs a, b, c, and d must run. Job a must not run before 9pm and requires file X to be present. Jobs b and c depend on a completing successfully. Job d must run after 2am and after b and c have completed successfully. If job c fails then an operator must fix the issue and rerun it, after which the tool will move on to job d. "

11 of 37 comments (clear)

  1. For a second there ... by one9nine · · Score: 3, Funny

    I thought it said Open Source Bath Management.

    Maybe I speak for myself, but some things are better off left proprietary.

  2. Systems like this are handy by blincoln · · Score: 3, Interesting

    I don't know of any OSS systems like this, but they are *very* useful for larger companies.

    A few years ago I was working in change control, and updates to software stored on network shares across the company were handled using a decrepit old VB app that generated linear xcopy scripts that updated each server (of which there were about 160 spread across the US) one by one. Most of the servers were on slow links, so distributing a 10MB file could take twelve hours or more.

    I hadn't learned to code properly at that time, but we used an enterprise batch scheduler called Control-M* that worked like the original post describes. What I did was wrote a batch script that read a config file and then executed a single robocopy command targeted at the server in the Control-M job definition.

    I had a whole array of these jobs, one for every target server, and they all depended on another job that would run at - for example - 11PM. So when that time rolled around, all of the dependent jobs could run. As-is, that would have overloaded the WAN and source server bandwidth. So I assigned what Control-M called a "resource" to all of the jobs. It was just an integer counter that I capped at 16. So at any given time, there were 16 "threads" of robocopy running. It ended up being between 20 and 30 times more efficient than the crappy xcopy scripts.

    Anyway, they're really handy, and if there isn't an OSS project like this, it would be a great idea.

    * This is not an endorsement of Control-M. In my new(er) job, I'm working as an engineer, and I discovered that the encryption system that it uses for storing account passwords in the registry is so poor that I was able to write a universal decoder for it using only vbscript and Excel. There are certainly other downsides to the app as well, although one cool thing is it runs on just about any platform - Unix, AS/400, OS/390, Windows, etc.

    --
    "...always new atoms but always doing the same dance, remembering what the dance was yesterday." -Richard Feynman
    1. Re:Systems like this are handy by OldAndSlow · · Score: 2, Funny
      ... I was able to write a universal decoder for it using only vbscript and Excel.

      And you just adimtted to the world that you violated the DMCA. Pity, I was starting to like you...

  3. a system like cron, but with job dependencies by hankaholic · · Score: 3, Interesting

    Ummmm... cron+make?

    Build systems aren't just for running compilers. :)

    --
    Somebody get that guy an ambulance!
  4. Cluster solutions work in single-machine mode too by Bamfarooni · · Score: 2, Insightful

    PBS and Sun's SGE do this kind of job
    management, but for clusters of machines.
    There's nothing that says you can't have
    a cluster of 1 machine though.

  5. cron + make + caffeine by MarkusQ · · Score: 2, Informative

    It works great for me. Just have to do a caffeine check before making major changes (and remember to stop the cron job plus test in a sandbox).

    Some handy tips:

    • Use pid files to keep new instances from starting up if a job goes long.
    • "-j" can be your friend, but (like a real human friend) it can also get you into a heap of trouble if you aren't careful.
    • Running the make in a permanent loop and just touching things with cron can be a handy trick, especially if you need to let users (or external processes) tweek the process.

    --MarkusQ

  6. TORQUE Resource Manager by Bryan_Casto · · Score: 5, Informative
    I think TORQUE Resource Manager will do what you're looking for. From their page:
    TORQUE (Tera-scale Open-source Resource and QUEue manager) is a resource manager providing control over batch jobs and distributed compute nodes. It is a community effort based on the original *PBS project and has incorporated significant advances in the areas of scalability, fault tolerance, and feature extensions contributed by NCSA, OSC, USC, the U.S. Dept of Energy, Sandia, PNNL, U of Buffalo, TeraGrid, and many other leading edge HPC organizations.
    --

    Bryan J. Casto
    bryan.casto(a)gmail.com
  7. Suggestions by RyanGWU82 · · Score: 2, Informative

    I'm working with systems like this right now. You might have better luck if you search for "workflow" instead of "batch." Googling for "open source" workflow management also brings back a bunch of promising hits. And if you're Java-centric, there's a great page which summarizes all the open source workflow engines available for Java.

  8. OSS Systems like this are handy by Roadkills-R-Us · · Score: 3, Informative

    We use PBS at work. I didn't pick it, but it works. There are other around, as well, though I don't recall their names off the top of my head. (PBS is avaoilable free, or supported, for a fee. We use the latter-- a commercial version of an OSS project. 8^/

    A search of google or any of the OSS sites should turn up several more.

  9. For example: by Ayanami+Rei · · Score: 4, Informative

    in cron.daily...
    make -j $NCPUs -C /working/dir /working/dir/Makefile -

    all: tasks/1 tasks/2 tasks/3

    tasks/1:
    foo bar baz
    frob fritz
    touch tasks/1

    tasks/2: tasks/2.1 tasks/2.2 some_make_test(tasks/2.3)
    bar baz qyzzy and touch tasks/2

    etc. etc. etc.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  10. Clarifications by Ayanami+Rei · · Score: 3, Informative

    1) one thing make can't do is run tests that generate dependancies at runtime... it does it in one pass at the beginning. Since you're running it iteratively this isn't a big deal.

    2) For a batch automation system, you'll need to use make -k, and if you need to, put targets in .DELETE_ON_ERROR if you don't do something like manually touching a status file at the end of a command.

    3) If you have a dependancy chain of targets and you don't want to have to clean up explicitly (or you want your job to run entirely in phases), you can label intermediate targets with .INTERMEDIATE, and if make finishes processing these things in one invocation, it wil delete the outputs/status files when all the dependant jobs are run. If it doesn't make it, then it will be forced to restart from the preconditions.

    4) Make sure to fully outline dependancies. If you need to somehow prevent two things from running in parallel, you need to create an artificial barrier with the script itself unfortunately. The easiest way to do this would be perl and IPC::SysV, I should think. You might know of some other shell tricks or opening a device that blocks like a FIFO... but it sucks that gnu make doesn't have it. (However HP-UX and SCO's make have a .MUTEX pseudo-target that prevents two things from being run in parallel... shame)

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON