Resources On Practical Job Scheduling?
felciano asks: "We have a fairly involved content build process that involves about 20 different jobs that have dependencies between them and are of sometimes radically different running times. Some jobs are parallelizable. We have a suite of machines across which we distribute the load right now, but the actual job-to-machine allocation is done by hand. I'm trying to do capacity and hardware purchase planning, and am getting a headache trying to model this. I've tried doing it in both MS Excel and MS Project, but neither seems suited for this type of model, especially when it comes to asking "what if" questions (e.g. if we bought 3 more machines, how much would that buy us in overall build time?). I realize that scheduling, network flow, etc. problems can get into into hairy (sometimes NP) problem spaces, but given the rise in popularity of Linux compute farms as ways to address scalability problems, I would assume that there are at least some basic tools to help model this. I've looked at distributed job-scheduling software and haven't found much on the automated side -- are there any practical toolkits, apps, libraries, etc. to help do this type of modeling and planning in other ways?"
This is the kind of job queueing theory was designed for. Pick up a textbook from your local university bookstore, if you're interested in the topic. This will let you fairly easily get estimates of how varying system parameters affect performance.
I'm afraid I don't know what commercial packages handle this, though. We used a high-level system simulation tool called "MAP", but it wasn't very intuitive to use. Better solutions certainly exist.
The best I've come up with is a glorified spreadsheet that estimates total build time and lets me tweak #of machines and # of split-up jobs, and watch the bottom line change. Gross but I can't think of anything better.
Have you considered writing a script that tries to optimize this for you?
I've hacked together SA-based optimizers in Perl on occasion for similar tasks.
Whether this is effective for you depends on how much time you'd spend scanning solutions by other methods (a script like this takes anywhere from a couple of hours to a couple of days to write, depending on complexity).