Open Source Distributed Shell Tools?
ColonelForbin74 asks: "While some may assume that most larger server clusters run advanced / custom software(i.e. Beowulf, cfengine, OSCAR), many of those stuck in the not-research-this-site-runs-production world know this simply isn't the case. Many people like myself are working with medium-to-large scale clusters with little help other than shell for() loops and some SSH trusted keys. What application-level tools are out there that might help SysAdmin / AppSupport types like myself run commands across a given cluster, push files out, etc? In my desperation to have some sort of tool in my toolbox, I've actually created one. However, I have a hard time believing this is the best thing out there, and would appreciate all the ideas and links I can get!"
A lecture from the Haifa Linux Club about the subject.
Make even shorter URLs - 8LN.org
http://www.bitmover.com/bitcluster/
DSH? I used it awhile back and was pretty happy with it.
It was a bit unstable, but that was almost a year ago. Give it a try.
PDSH works pretty well in my experience. It's pretty good to run commands on the nodes and pdcp can copy files out.
/joeyo
2^5
Well on FreeBSD I installed the net/clusterit port, it's really wimpy (just dsh and multi-copy type stuff) and buggy, but I really haven't found anything substantially better.
cfengine is closest to what I'd like, but there's something *weird* about it, and I just can't get into it.
But what'd I'd like is to be able to divide my servers into multiple overlapping classes (like, ones with mail servers, ones with DNS; ones with Red HAt, ones with FreeBSD4; ones in rack 1, ones in rack 2; etc...) and configure certain tasks (DNS server on Red HAt: do this... etc). tasks could get output or edit files or do whatever I'd like.
then I can call a task from cron or from the command line on the master machine and do my work.
any tips are appreciated...
We have many extra Windows XP machines around here, which idle around most of the time.
We needed some machines for running stress testing against our network servers, but we didn't have enough horse power to run a pure linux based clustering/distributed stress client.
I looked around abit, like you, and found there wasn't much.
Because of this I have written some hackish python code that basicly creates a cross platform distributed and self-updating cluster.
We use it to run our cross platform stress test application across many machines, without forcing these machines to be formated to linux.. etc.
I plan on releasing these scripts as open source sometime soon.
Look for them on Freshmeat and http://open.cyanworlds.com
-chip
We find it to be an improvement over simple shell tools for typical cluster administration tasks.
There are no karma whores, only moderation johns
During the Munich IETF 1997(?) I used rdist (part of Irix) to copy files from one machine to 40 others, as someone thought NFS was not an option.
When I had a set of (permanently running...) Unix workstations last, I used sh for-loops and ssh to run commands.
During another cluster project I was happy to use NFS to share files, and used rsh over ssh as it was ways faster.
Oh, and if you ever need to render mpegs from jpegs, check out the UCB's excellent "mpeg_encode", which does all the load balancing on a set of machines all by itself. Yumm!
- Hubert
I like the Rocks Cluster Distribution. It is above all simple to use, well documented, and stable.
There's really not all that much to it... bundling some scripting in the language of your choice around parallel ssh session is a pretty decent solution that most people seem to arrive at.
11*43+456^2
If you got a few million to spare, Tivoli does anything!
You might try radmind. It's used pretty popularly in the Mac OS X world, but was originally written for Solaris, Linux, and *BSD. There's a reasonably sized community using it, and a supportive mailing list.
:w
is free for both solaris (Sparc & x86) and Linux..
t ml
http://wwws.sun.com/software/gridware/sge_get.h
grid engines 'tend' to be more useful as they can balance the load better to non-dedicated hosts. Just my view, but saves building a dedicated cluster with all these 2ghz pentiums on the desktop..(assuming you have linux on the desktop of course)
--
CFengine rocks. It isn't a distributed shell, but for configuration management and remote automated changes, you can't beat it.
I forget what 8 was for.
Where I work (a LARGE networking company that makes all kinds of networking hardware) a co-worker and I created multiple parallel SSH tools which enable you to run hundreds to thousands of concurrent outgoing sessions, depending on hardware. We have not yet had the cycles to look into open sourcing it, but hope to.
:-) ) . The only way to do this (other than having some type of expect type program typing in the passphrase for you) is to use the ssh-agent. The problem with the ssh-agent is that is simply does not have the ability to authenticate more than say 20+ ssh sessions as once (depending on machine load, etc). What happens when too many ssh sessions attempt to authenticate against the ssh-agent is that you get many authentication failures due to timeouts. There are some hacks you can do to the ssh source code that will increase the number of times ssh will attempt to contact the agent, as well as the delay between attempts. We've done these hacks, but they still were nowhere near enough.
:-)
I can share the basics of it here though, which should enable somebody else to easily build their own. On a day to day basis we needed to be able to run commands on 10,000+ Solaris and Linux boxes, and wanted to use SSH key authentication, but not keys with a null passphrase (as if the private key was stolen, major security implications present themselves
The solution instead is to use MULTIPLE ssh-agents, and load balance between them. We wrote a tool that will prompt for our key passphrase and then load say 100 ssh-agents with that key loaded. When it starts the agents it records the variables SSH_AUTH_SOCK and SSH_AGENT_PID for each agent in a single file. We then have shell scripts wrapped around ssh commands that just randomly pick an agent to connect to, effectively load balancing.
We run this whole thing on an OpenMosix cluster, which allows the ssh-agents and ssh processes to migrate across the machines once they start to use too much CPU time on their current node. We've found that Linux boxes seem to be much faster for SSH operations than Solaris (sparc) boxes, BTW.
We have also written a parallel ssh tool that works similarly to others discussed here (and others NOT discussed here, like Ed Hill's clsh which in a previous life I used extensively), except our tool has a couple of other major features which (IMHO) are required in an enterprise environment. The biggest thing that we've found is that when working on boxes in the far reaches of the world, we cannot assume that any common group of NFS mounts will exist, or work properly when we need them to. If you cannot be sure what remote mounts are available, how can you run scripts on the remote box? This prompted us to make our program have the ability to both run perl code directly fed to it, as well as (basically) remotely deliver scripts for running and delete them afterwards. So if we've written an administrative script called foo.sh, our tool will basically pipe the script across a SSH session to the remote end and run it, usually never having to touch the remote disk at all. This is VERY useful because when talking about 10k+ boxes, many of which are desktops, you can never be sure which partitions will be full.
Using our parallel ssh tool, along with the ssh-agent load balancing and a 3 node OpenMosix cluster we've been able to run 1000 outgoing ssh sessions without issues. This means if you want to change root passwords on 10k boxes it only takes slightly longer than changing passwords on 10 boxes. A real time saver, to say the least
Comments anyone?
BTW, is anybody using any hacks of OpenSSH to work similarly to sudo for giving out root access?
-- I speak only for myself.
That load-balanced multiple ssh-agent idea is neat.
'ghosts' is a command which has been included with perl in the 'eg' directory since at least 4.036. It does this effectively, allowing you to do
;). *EXTREMELY* simple, too.
gsh somemachines somecommand
or
gcp somefile somemachines:/etc/newfile
worked great, last time I had to admin a large network (about 5 years ago
http://outflux.net/unix/software/gsh/ seems to be an updating of this tool.
I'd rather eat my own vomit.
- pconsole: http://www.heiho.net/pconsole/
- DSH: http://dsh.sourceforge.net/
DSH is nice for relatively small things that need to get run everywhere, and has an interactive mode that works fairly well. It works from any command line.Pconsole on the other hand requires X, and creates a seperate terminal window for every host you are connecting to. There is a small 'command' window that echos everything you type to all of the other terminals under it's control. If you have ever seen Sun's "Cluster Management Tool" from a few years ago, this is very similar. It can also attach to existing sessions as well.
Would this work? http://sourceforge.net/projects/queue/