Slashdot Mirror


Open Source Batch Management?

Asgard asks: "My employer is currently running a commercial batch management platform. Unfortunately the licensing model makes it unfeasible to run it in the development / testing environments, leading to poor usage of the tool and unexpected failures in production. I'm looking for an equivalent Open Source tool and am wondering how others have approached the problem. Does Slashdot have any suggestions?" Imagine a system like cron, but with job dependencies. Are there any batch systems out there like this? "The tools I've found through web searches mostly treat 'batch management' from the cluster perspective -- a user submits an ad-hoc job and the tool figures out where and when to run it based on load and architecture requirements. Instead I am looking for something that manages daily schedules of jobs based on their dependencies with other jobs and external events, such as files arriving or time.

An example might be that every day jobs a, b, c, and d must run. Job a must not run before 9pm and requires file X to be present. Jobs b and c depend on a completing successfully. Job d must run after 2am and after b and c have completed successfully. If job c fails then an operator must fix the issue and rerun it, after which the tool will move on to job d. "

37 comments

  1. I think this question was asked before... by afd8856 · · Score: 1

    Right here on slashdot. Maybe a search through the archives will find it...

    --
    I'll do the stupid thing first and then you shy people follow...
  2. For a second there ... by one9nine · · Score: 3, Funny

    I thought it said Open Source Bath Management.

    Maybe I speak for myself, but some things are better off left proprietary.

    1. Re:For a second there ... by Anonymous Coward · · Score: 0

      I thought it said Open Source Bath Management.
      Maybe I speak for myself, but some things are better off left proprietary.


      Are you kidding? If bath was proprietary than open source zealots would never use it!
      Oh, wait a minute...

    2. Re:For a second there ... by gstoddart · · Score: 1
      I thought it said Open Source Bath Management.

      You know, I saw "Open Source Bitch Management.

      I was unclear as to its function. i-Pimp 2.0 or something. :-P

      --
      Lost at C:>. Found at C.
  3. Would you be doing adhoc runs? by T-Ranger · · Score: 1

    That is would "normal" people be setting them up? If not, you could simply use make. Or Ant.

  4. Systems like this are handy by blincoln · · Score: 3, Interesting

    I don't know of any OSS systems like this, but they are *very* useful for larger companies.

    A few years ago I was working in change control, and updates to software stored on network shares across the company were handled using a decrepit old VB app that generated linear xcopy scripts that updated each server (of which there were about 160 spread across the US) one by one. Most of the servers were on slow links, so distributing a 10MB file could take twelve hours or more.

    I hadn't learned to code properly at that time, but we used an enterprise batch scheduler called Control-M* that worked like the original post describes. What I did was wrote a batch script that read a config file and then executed a single robocopy command targeted at the server in the Control-M job definition.

    I had a whole array of these jobs, one for every target server, and they all depended on another job that would run at - for example - 11PM. So when that time rolled around, all of the dependent jobs could run. As-is, that would have overloaded the WAN and source server bandwidth. So I assigned what Control-M called a "resource" to all of the jobs. It was just an integer counter that I capped at 16. So at any given time, there were 16 "threads" of robocopy running. It ended up being between 20 and 30 times more efficient than the crappy xcopy scripts.

    Anyway, they're really handy, and if there isn't an OSS project like this, it would be a great idea.

    * This is not an endorsement of Control-M. In my new(er) job, I'm working as an engineer, and I discovered that the encryption system that it uses for storing account passwords in the registry is so poor that I was able to write a universal decoder for it using only vbscript and Excel. There are certainly other downsides to the app as well, although one cool thing is it runs on just about any platform - Unix, AS/400, OS/390, Windows, etc.

    --
    "...always new atoms but always doing the same dance, remembering what the dance was yesterday." -Richard Feynman
    1. Re:Systems like this are handy by OldAndSlow · · Score: 2, Funny
      ... I was able to write a universal decoder for it using only vbscript and Excel.

      And you just adimtted to the world that you violated the DMCA. Pity, I was starting to like you...

    2. Re:Systems like this are handy by blincoln · · Score: 1

      And you just adimtted to the world that you violated the DMCA.

      It was for security testing purposes. I submitted my findings to the vendor, who didn't act on them to my knowledge.

      --
      "...always new atoms but always doing the same dance, remembering what the dance was yesterday." -Richard Feynman
  5. a system like cron, but with job dependencies by hankaholic · · Score: 3, Interesting

    Ummmm... cron+make?

    Build systems aren't just for running compilers. :)

    --
    Somebody get that guy an ambulance!
    1. Re:a system like cron, but with job dependencies by bedessen · · Score: 1

      How does that fit the requirements? Sure it can be hacked up to work, but consider that what he wants is not "run this at time X and follow these dependencies", it's "must run at any time after X and requires Y and Z to have happened." You can set a cron job to fire at X, but if Y and/or Z hasn't finished/happened yet what are you to do? Just sit there and wait? What if other things depend on this task? Will they all pile up into a heap of waiting processes? Can the waiting process really block on two independent events efficiently? What about 20, 50, 100? What if the dependent-task never happens, are you going to have the process kill itself eventually?

      Or perhaps have the cron job poll every 5 minutes? Seems like that wouldn't scale if you have lots of processes. And because cron doesn't know about dependencies it still has to spawn something every 5 minutes even if the job has been run already -- even if it's just to check if the job has been run.

      Both solutions seem wasteful.

      I think what he was asking for is a scheduler that can handle both the time and the dependency checking from the same logic. Cron only knows to launch things at certain times and make only knows how to check dependencies. They are two seperate things that solve two seperate problems, in the great unix tradition, but it just seems to me very hackish to try to combine them when what you really want is a dispatcher/scheduler that knows about both. With make+cron because the two parts don't have knowledge of the other, you're basically back to polling every 'n' minutes to see if requirements have been met yet.

    2. Re:a system like cron, but with job dependencies by hankaholic · · Score: 0, Flamebait

      Back off, man -- he didn't mention having thought about using cron with a build system, so I suggested it. There was nothing else in the comments regarding an actual solution at the time, so I suggested an actual solution.

      You attack my suggestion as being wasteful or suboptimal, but at the time I posted there were no solutions.

      I hope it feels good to have pointed out the inadequacies of my attempt to provide some initial direction, especially given the fact that at the time I write this, 22 hours after the story hit the front page, there are still only 25 posts in the thread.

      Have you suggested anything more useful than what my post mentioned, or are you just trying to be disagreeable?

      --
      Somebody get that guy an ambulance!
    3. Re:a system like cron, but with job dependencies by bedessen · · Score: 1

      I'm not trying to be disagreeable at all, and I don't really care about the timing of the posting of the article and the various replies. It doesn't matter to me if replies take days or weeks. I'm just trying to demonstrate how "cron+make" perhaps isn't the end-all solution to the question posed. A technical solution should stand on its own merits and I'm sad to say that "cron+make" works for a lot of cases but leaves the case of the original article poster in the dark.

    4. Re:a system like cron, but with job dependencies by hankaholic · · Score: 0, Flamebait

      But slightly less in the dark than before I suggested trying cron and make, right? Then why attack my suggestion? Does it make you feel special?

      Congratulations, you're special. Just like every other flamebaiting troll.

      --
      Somebody get that guy an ambulance!
  6. Please, some manners... by Anonymous Coward · · Score: 1, Informative

    Please don't join conversations in which you have no interest.

    1. Re:Please, some manners... by Anonymous Coward · · Score: 0

      Word.

      I like how that site (justfuckinggoogleit.com) doesn't work. It generates broken links to google.

  7. Condor by raider_red · · Score: 1

    The Condor project looks promising. I've been looking for something similar as an alternative to LSF.

    --
    It's good to use your head, but not as a battering ram.
  8. DOS by Anonymous Coward · · Score: 1, Funny

    Wouldn't FreeDOS work? I have a bunch of batch files that work in MSDOS.

  9. Workflow systems... by mosabua · · Score: 1

    This sounds very much like a workflow system to me. There are many out there. I am currently working with jbpm. Many have all sorts of plugins and can be programmed to do more. They also come with process definitions ... and on another note. To some extend build tools like ant can do things like that too...

  10. Get a real vendor... by eggoeater · · Score: 1

    I've never heard of a vendor that isn't flexible when it comes to development and test environment licenses. I work in the financial sector and every system (EVERY SINGLE ONE) has at least a development environment and a pre-prod/UAT/Test environment. For more critical applications that go through a lot of regular change (i.e. website) there's actually SEVEN environments it goes through, the last being production.

    We use an enterprise scheduling system called AutoSys which is suppose to be the industry standard, but it doesn't impress me...and I think it's super-expensive. Good luck. -Steve

  11. Cluster solutions work in single-machine mode too by Bamfarooni · · Score: 2, Insightful

    PBS and Sun's SGE do this kind of job
    management, but for clusters of machines.
    There's nothing that says you can't have
    a cluster of 1 machine though.

  12. cron + make + caffeine by MarkusQ · · Score: 2, Informative

    It works great for me. Just have to do a caffeine check before making major changes (and remember to stop the cron job plus test in a sandbox).

    Some handy tips:

    • Use pid files to keep new instances from starting up if a job goes long.
    • "-j" can be your friend, but (like a real human friend) it can also get you into a heap of trouble if you aren't careful.
    • Running the make in a permanent loop and just touching things with cron can be a handy trick, especially if you need to let users (or external processes) tweek the process.

    --MarkusQ

  13. Use Gridengine by Anonymous Coward · · Score: 1, Interesting

    http://gridengine.sunsource.net/

    It handles batch jobs, dependancies etc etc.

  14. TORQUE Resource Manager by Bryan_Casto · · Score: 5, Informative
    I think TORQUE Resource Manager will do what you're looking for. From their page:
    TORQUE (Tera-scale Open-source Resource and QUEue manager) is a resource manager providing control over batch jobs and distributed compute nodes. It is a community effort based on the original *PBS project and has incorporated significant advances in the areas of scalability, fault tolerance, and feature extensions contributed by NCSA, OSC, USC, the U.S. Dept of Energy, Sandia, PNNL, U of Buffalo, TeraGrid, and many other leading edge HPC organizations.
    --

    Bryan J. Casto
    bryan.casto(a)gmail.com
  15. Sounds familiar by belroth · · Score: 1
    I'm currently designing such a system for work. My basic spec is to have a system that can run an arbitrary batch when another job has run, before or after a specified time and with regard to the file system information of a specified file.
    e.g. run Job A when job X has successfully completed and file P has been updated, run job B if job X hasn't run by 3am, run job C if job Y fails.
    I've got most of the rough design done, the main problem is specifiying date/time information - I would like to say "every 2nd Thursday", "the nth of every month", "the first monday in every second month" etc. but I haven't got that done yet.

    Complications: running on Windows 2000 platform, zero budget and a lot of the jobs will run 16 bit apps in a NTVDM.

    I've considered VBA, having Access it costs nothing and I've written a library of WinAPI wrappers so yes it will do it, but I'd rather use something else. I'm still tempted to use Common Lisp, but it's the system level API calls that I don't know about.
    So it's probably going to be Perl, which is good - I can invoke processes and get the pids so I can suspend a job if it get's bumped by one with higher priority. A complication is that ntvdm processes return a pid of zero, but diffing the pid list before and after solves that. Another useful trick is going to be using Math::Logic to allow tri-state logic to make the dependency processing simpler. Of course I want to serialize everythin to disk for ease of re-starting. I probably won't be able to release it, if I'm allowed to write it, but I'd like to. A friend suggested cron+scripts but it won't cope with the complexity of the scenatios I want to address.

    Have fun,

    --
    I hereby inform you that I have NOT been required to provide any decryption keys.
    1. Re:Sounds familiar by Anonymous Coward · · Score: 1, Funny

      Complications: running on Windows 2000 platform, zero budget and a lot of the jobs will run 16 bit apps in a NTVDM.

      What, the project manager couldn't squeeze "devs must hit themselves in the balls with a hammer each hour" into the requirements list?

    2. Re:Sounds familiar by belroth · · Score: 1
      Actually if it had an associated cost I wouldn't be allowed to. On the other hand if it saved money I'm sure it would be compulsory.

      Actually there's no project manager for this yet, just me. I'm, er, doing a feasibility study at present, of course sometimes the only way to determine if something is possible is to do it...

      --
      I hereby inform you that I have NOT been required to provide any decryption keys.
  16. Suggestions by RyanGWU82 · · Score: 2, Informative

    I'm working with systems like this right now. You might have better luck if you search for "workflow" instead of "batch." Googling for "open source" workflow management also brings back a bunch of promising hits. And if you're Java-centric, there's a great page which summarizes all the open source workflow engines available for Java.

  17. OSS Systems like this are handy by Roadkills-R-Us · · Score: 3, Informative

    We use PBS at work. I didn't pick it, but it works. There are other around, as well, though I don't recall their names off the top of my head. (PBS is avaoilable free, or supported, for a fee. We use the latter-- a commercial version of an OSS project. 8^/

    A search of google or any of the OSS sites should turn up several more.

  18. For example: by Ayanami+Rei · · Score: 4, Informative

    in cron.daily...
    make -j $NCPUs -C /working/dir /working/dir/Makefile -

    all: tasks/1 tasks/2 tasks/3

    tasks/1:
    foo bar baz
    frob fritz
    touch tasks/1

    tasks/2: tasks/2.1 tasks/2.2 some_make_test(tasks/2.3)
    bar baz qyzzy and touch tasks/2

    etc. etc. etc.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  19. Apache Ant by rimu+guy · · Score: 1

    Apache's Ant may be worth a look. It handles dependancies very well. It may not be so great with timing of jobs (cron + ant?) or handling jobs running in parallel (ant plus a custom 'run task in the background'?).

    --
    Linux Server + Persistence => Solution

  20. That's not true... by Ayanami+Rei · · Score: 1

    Cron + make is actually a pretty good solution. Why?

    make can told to proceed as far as possible with missing results. If you keep running it every so often, it will eventually get all it's dependancies as soon as possible and produce the "final" result. (These results are intermediate files that just checkpoint progress... unless you are using a custom make test)

    What's interesting is that you can ask make to treat "dependancies" as either a all-present or a do-in-order type of thing (or both). Even cooler is listing dependancies as functions which return "values" that are files that exist or do not exist to express transient effects. This of course means you need to run make periodically for it to re-asses the situation.

    Make is not suitable when the process has a lot of constraints that are in flux. But it is quite suitable for do-stuff-as-soon-as-feasible batch processing.

    If you can write all your steps as individual scripts, and you can build a map of dependancies, then you can write a single makefile which encapsulates all of this very easily. Moreover gnu make is nice because you can give it hints about what subtasks can and cannot be parallelized, and it will handle that for you.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  21. Roll your own... by dan14807 · · Score: 1

    Really. It doesn't sound like it would be too difficult to write this yourself with some good Unix scripting (Perl, bash, etc.)

    You said it's to serve as a test system for a commercial application. I assume you already have a "schedule" in mind, so maybe you could simplify things a bit by writing a system that only runs your specific schedule, rather than writing something more general. I don't know if that would provide a valid test case for your purpose.

  22. Clarifications by Ayanami+Rei · · Score: 3, Informative

    1) one thing make can't do is run tests that generate dependancies at runtime... it does it in one pass at the beginning. Since you're running it iteratively this isn't a big deal.

    2) For a batch automation system, you'll need to use make -k, and if you need to, put targets in .DELETE_ON_ERROR if you don't do something like manually touching a status file at the end of a command.

    3) If you have a dependancy chain of targets and you don't want to have to clean up explicitly (or you want your job to run entirely in phases), you can label intermediate targets with .INTERMEDIATE, and if make finishes processing these things in one invocation, it wil delete the outputs/status files when all the dependant jobs are run. If it doesn't make it, then it will be forced to restart from the preconditions.

    4) Make sure to fully outline dependancies. If you need to somehow prevent two things from running in parallel, you need to create an artificial barrier with the script itself unfortunately. The easiest way to do this would be perl and IPC::SysV, I should think. You might know of some other shell tricks or opening a device that blocks like a FIFO... but it sucks that gnu make doesn't have it. (However HP-UX and SCO's make have a .MUTEX pseudo-target that prevents two things from being run in parallel... shame)

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  23. Fcron? by fishwaldo · · Score: 1

    Its pretty flexible... With a bit of shell scripting around it, I imagine you could do this.
    http://fcron.free.fr/

  24. We need Open Source AutoSys replacement by fahrv · · Score: 1

    I've had the same need - a previous posted mentioned AutoSys, which for all of its ugly faults gave my last employer a very robust job scheduling platform that I found very reliable.

    I've been looking (waiting?) for an open source equivalent. What we really need is something like Condor and Globus, ala the NSF Cluster Toolkit, with a cron interface (cluster-centric solutions have great features like redirection of STDOUT and STDERR, but don't have the ability to schedule a job for later execution.) Java Workflow systems have the kind of business logic you would want, but lack the cross-platform job execution and STDOUT redirection needed. Good luck finding something.

  25. Diogene87 by Anonymous Coward · · Score: 0

    Diogene87 (URL:http://diogene87.sourceforge.net/) is a promising job scheduling system (for Linux).
    From the website :
    Diogene87 provides advanced features :

    * centralized management : jobs can be run on local or remote servers (on TCP/IP network).
    * jobs dependences : a job can wait for another to be terminated. A job can be started when an other is normally finished or aborted.
    * start condition : a job can wait for a file-presence, for a manual validation or for a specified time.
    * planning : a job can be planned at regular interval : every day, every month, every year...
    * log of job activity : output of jobs (on console) are logged.
    * job monitoring : a web interface is provide to control job execution.
    * statistic for job duration : minimum, maximum and average duration.
    * resource control : job queue with threshold for maximum number of running jobs allow to control access to limited ressources. Job queues can be manually opened or closed if the associated resource is not available.