Open Source Experiment Management Software?
Alea asks: "I do a lot of empirical computer science, running new algorithms on hundreds of datasets, trying many combinations of parameters, and with several versions of many pieces of software. Keeping track of these experiments is turning into a nightmare and I spend an unreasonable amount of time writing code to smooth the way. Rather than investing this effort over and over again, I have been toying with writing a framework to manage everything, but don't want to reinvent the wheel. I can find commercial solutions (often specific to a particular domain) but does anyone know of an open source effort? Failing that, does anyone have any thoughts on such a beast?"
"The features I would want would be:
- management of all details of an experiment, including parameter sets, datasets, and the resulting data
- ability to "execute" experiments and report their status
- an API for obtaining parameter values and writing out results (available to multiple languages)
- additionally (alternately?) a standard format for transferring data (XDF might be good)
- ability to extract selected results from experimental data
- ability to add notes
- ability to differentiate versions of software
- automatically run experiments over several parameters values
- distribute jobs and data over a cluster
- output to various formats (spreadsheets, Matlab, LaTeX tables, etc.)
- provide a fancy front-end (that can be done separately - I'm thinking mainly in terms of libraries)
- visualize data
- statistical analysis (although some basic stats would be handy)
MAUS roxors.
In soviet russia MtnDew Buys Gabe.
I also did lots of comp sci empirical experiments. My experience is that the tools used for experimenting itself is very ad-hoc and not easily scriptable. Most of the times we are required to tend the hour-long experiments to see what happened on the output and decide what to do next. And... the decision is often times not clear cut. Some sort of heuristic is needed. Not to mention about the frustations when the errors occur (especially when the tool is buggy, which is very often in research settings). So, considering this, what I would do is to construct a script and do the experiments in phases. Run it and see the result several days after.
I also noticed that from one experiment to another is sometimes radically different that I would doubt it is easily manageable.
--
Error 500: Internal sig error
Take a look at the object modeling system. It is currently being developed by Agricultural Research Service but many other agencies are cooperating.
http://oms.ars.usda.gov/
You could look into providing some kind of web-services feel to the computation, and then use an open-source provenance server.
A provenance server might handle the recording of queries, results etc. Not sure how many good open source ones there are.
Last.fm - join the social music revolution
1. You cannot (?) afford commercial software.
2. It is impractical for you to continue writing your own software.
3. You cannot find open source software.
-------
Conclusion: Steal commercial software! -)
Sounds like you need to use Perl.
I find it to be an excellent language for maintaining data.
Ideally, I'd type make paper and it would start from the beginning stages of the experiment and go all the way through creating the paper. Moreover, if anything died along the way, I could fix the problem, type make again, and it would more or less pick up where it left off, not re-running things it had already done (unless they were affected by my fix).
But after playing with this for a few days, I became convinced that make wasn't up to snuff for what I wanted. I have these sort of `attribute-value' dependency constraints. From one raw initial dataset, I create several cross-validation folds, each of which contains one training set and a couple varieties of test set. the filenames might look like
Now suppose that the way I actually run an experiment involves passing a test set and the corresponding training set to the model I'm testing, a command like: Since, however, I have to run this over several folds (and other variations that I'm glossing over), I'd like to write an 'implicit rule' in the Makefile. This involves pattern-matching the filenames. But it's a very simple pattern-matching: you get to insert oneYou might be thinking, you could do
but then I have to copy this rule several times for each sort of test set, even if the command they run is the same.The underlying problem, I think, is that the pattern-matching in make's implicit rules is too simple. What I would rather have is some kind of attribute-value thing, so I could say something like
where fileid corresponds to 'base.fold0' and whatever other file identifying information is needed.This notation is sort of based on a natural language attribute-value grammar.
Anyway, if anyone has any suggestions as to this aspect of the problem, I would be grateful
Did you consider R, a Splus clone? For Scientific Statistics a very flexible solution. http://www.r-project.org
What I would end up doing is setup an Ant build file for each experiment, under each algorithm.
And then you can update property files, using a quick shell script, or something along those lines at the end of the data set, as well as having build/run times that Ant can retrieve for you. Good solution, and you aren't reinventing the wheel.
Requires Java, which depending upon your ideology is either a good thing or a curse.
Dacels Jewelers can't be trusted.
Good Luck
Something like the AppLeS Parameter Sweep Template software might suit your needs. I've never used it myself, but it looks like it might be close to what you're looking for.
See here for other projects from the GRAIL lab at SDSC and UCSD.
I don't mean to sound cynical, but this seems to come across to me as a very nicely written:
Ne3D H3lp WIt M4H H4x0RiN!!!!!
I mean, let's face it, much of what modern hacking closed-sourced software consists of is throwing a variety of shit against a variety of programs in a variety of configurations and seeing what breaks and then following up to make an exploit out of it.
While this probably isn't the case here, it's very hard to read that note and not snicker just a tiny, tiny bit . . .
But what you are looking for, sir, is the cheap labor commonly known as a Graduate Student
- Many of these "grads" [as they are commonly known] have INDEED been able to " 'execute' experiments and report their status", as well as "writing out results (available in multiple languages)".
- The Graduate Student is often known for their abilities to create and distribute notes in lieu of bringing that onerous burden upon more high-ranking academic officials
- ...you don't even have to dream about doing "clustered work" or "outputing results to spreadsheets, Matlab, LaTeX tables, etc....". These fancy machines can definately do that...
- Of course, there are several "graduate students" that provide a fancy front end (and rear end, for that matter). I think that I would agree with your assesment that they do not need to have that feature, although it might make your days a bit more... ermm... *pleasant*
:-)
- As well, most graduate students have the capability of performing "basic stats", although most don't have an extensive faculty for performing such calculations...
- And don't you even worry about the price -- you'll see that they're quite affordable.
To conclude, you say that "There's no reason such software should be limited to computer science (nothing I'm contemplating is very domain specific). I can imagine many disciplines that would benefit". I would wholeheartedly have to agree with you: just about every discipline can do more and see farther by standing on the backs of their graduate students.In fact, I'm afraid to report that you are a bit behind the times in this department as these "Graduate Student" devices are quite common at universities and research labs.
Use Excel, sorry Im a jerk....
YOU FAIL IT
The rumor is, it's something called **work**.
From the OP: I have been toying with writing a framework to manage everything, but don't want to reinvent the wheel.
Seems to me that the OP is more than capable of doing the work, but he is smart for trying to find an existing solution. The rumor is, it's something called **working smarter**, not **working harder**. :-)
Karma: -2147483648 (Mostly affected by integer overflow)
We experimental high-energy physics folk have been using it (and PAW) for some time. It offers scripting and histogramming and analysis and a bunch of other features. And it's open source. Check it out.
I've been very happy using jdb (see below) to handle individual experiments, and directories and shell scripts to handle sets of experiments.
JDB is a package of commands for manipulating flat-ASCII databases from shell scripts. JDB is useful to process medium amounts of data (with very little data you'd do it by hand, with megabytes you might want a real database). JDB is very good at doing things like:
For more details, see http://www.isi.edu/~johnh/SOFTWARE/JDB/.
http://sourceforge.net/projects/pythonlabtools/
Perl is great for making seemingly complex busy-work out of trivial problems, and constructing arcane rube goldberg devices that nobody can understand, so they can't fire you without throwing away the software.
Perl is the ultimate programming language for corporate leaches!
But if you actually want to solve problems an get work done, use another language than Perl.
With the slight re-grouping of the title phrases as above, I think we can all agree the answer is:
FBI's Carnivore.
(Well, that's the way the headline parsed out for me the first time I glanced at it...)
http://catalog.com/hopkins/text/head.html
Take a look and feel free: http://www.PieMenu.com
You seem to suggest that the specifics of the software used in the experiments themselves is too varied and engineered to respond to object management within the native environment.
Ok, you take the management piece into a meta-environment like web e-commerce. Each iteration produces a transaction, essentially a line in a table containing the common meta-elements and then you perform your management via linked queries on this data set ala Napster.
If all of your data engines are connected (Intranet), the only thing that needs to be centralized is the knowledge of what is where.
So, you build on the code from one of the open-source e-commerce engines and combine that with the code elements from one of the open peer-to-peer management Napster colnes.
Since the code is OPEN, you can do this.
The best way to do is to be.
fuck, you're gross!
Score 1 informative?!?! This is a goatse link guys and grrls.. don't click it...
"Consider how lucky you are that life has been good to you so far. Alternatively, if life hasn't been good to you so far
It might not satisfy all your requirements out of the box, but could you put something together with tcltest?
How did a link go Goats.cx get modded informative? Should I infer that moderators don't follow links and just read buzzwords? A very successful troll, but gross, gross, disgusting. Don't click the link.
Everything I've ever learned the hard way was based on a statistically invalid sample.
It might take some work, but Eclipse from IBM has improved a great deal towards becoming a good environment for project management. Its geared towards projects written in Java, but there is a C/C++ Perspective plugin if you prefer...
Its a good platform for managing a collection of custom ant build scripts if you decide to go that direction (assuming your in java of course...)
If you'd prefer something more specialized, the plugin architecture isn't bad and could save some time with interface work. Especially since any windows from other perspectives that you like can be dropped directly into your custom-built perspective.
Food for thought...
www.eclipse.org
The number you have dialed is imaginary, please rotate your phone 90 degrees and try again.
I was able to do almost everything on my thesis using open source tools, LaTeX on Linux, except when it came to data reduction - I was forced to use the crippled student version of SPSS. I would love to see a GNU clone of this functionality the way they have cloned Matlab.
My rights don't need management.
...it sounds like you don't know what the hell you are doing. Now would be a good time to start investigating a new career.
NEW ORLEANS (AP) Earl King, the prolific songwriter and guitarist responsible for some of the most enduring and idiosyncratic compositions in the history of R&B, died Thursday from diabetes-related complications. He was 69.
Over his 50-year career, King wrote and recorded hundreds of songs.
His best-known compositions include the Mardi Gras standards ''Big Chief'' and ''Street Parade''; the rollicking ''Come On (Let the Good Times Roll),'' which both Jimi Hendrix and Stevie Ray Vaughan recorded; and ''Trick Bag,'' the quintessential New Orleans R&B story-song.
'''Come On (Let the Good Times Roll)' might be the one that people know, but I wish the world would hear more of his songs,'' said Mac ''Dr. John'' Rebennack, a longtime friend, fan and collaborator of King. ''He approached songs from different angles, from different places in life.''
In his prime, he was an explosive performer, tearing sinewy solos from his Stratocaster guitar and wearing his hair in an elaborate, upraised coif.
King's songwriting was informed by syncopated New Orleans beats and his interest in a broad range of subjects, from medieval history to the vagaries of the human heart and his own so-called ''love syndromes.''
''Most people say, 'Well, Earl, you sing the blues,' or however they want to categorize it,'' King said in a 1993 interview. ''I just sing songs. I'm a writer, so whatever gymnastics jump through my head, I write about it.''
Born Earl Silas Johnson IV, King described himself as a ''nervous energy person'' who constantly needed to be engaged in some creative pursuit.
He cut his first singles in the early 1950s, taking on the stage name ''Earl King'' at the suggestion of a record promoter.
Scenes and acquaintances from his life often found their way into his lyrics with little editing. A story King's grandmother told about his father, a blues pianist who died when King was a boy, inspired ''Trick Bag.''
In the song, the protagonist sings to his wayward significant other, ''I saw you kissing Willie across the fence, I heard you telling Willie I don't have no sense/The way you been actin' is such a drag, you done put me in a trick bag.''
Funeral arrangements had not been finalized late Friday evening.
What you describe does indeed sound like High Energy Physics.
And the "middleware" you need are the GNU tools gluing together the specialized programs that do the specific things you want.
We have been using unix for a long time, and many of us prefer the combination of small targeted tools philosophy rather than a single monolithic package.
I will repeat, and you can stop reading now if you want. The GNU tools, unix, and specialized scriptable programs are already the "middleware" you seek.
If you are just missing some of the tools in the middle, here are the ones used in HEP. You might find more appropriate ones closer to whatever discipline you work in.
All the basic unix text processing tools and shells.
bash. csh. Perl. grep. sed. and so on.
Filename schemes ranging from appropriate to clever to bizarre.
(See other posts here)
Make it so that all the inputs you want to change can be done on the command line or with an input steering text file.
Same tools combined with some simple c-code to produce formats for spreadsheets or PAW or ROOT or whatever visualization or post-processing thing you need done. Has ntuple and histogram support automatically, which might be all you need.
Almost always I choose space delimited text for simple output to push into PAW, ROOT, or spreadsheets. I keep a directory of templates to help me out here.
Some people use full blown databases to manage output. For a long time there have been databases specific to the HEP needs. I recently have started using XML-style data formats to encapsulate such things in text files if the resulting output is more complicated than a single line. You mention XDF, sure, that sounds like the same idea.
CONDOR (U Wisconsin) has worked nicely for me for clustering and batch job submission when I need to tool through 100 data files or 100 diffrent parameter lists on tens of computers. The standard unix "at" is good enough in a pinch if you play on only 5 computers or so.
HEP folks use things like PAW and ROOT (find them at CERN) which contain many statistical analysis things and monstrous computation algorithsm. Or at least ntuples, histograms, averages, and standard deviations. You could go commercial or the gsl here if you prefer such things.
CVS or similar to take care of code versions.
Don't forget to comment your code.
We write our own code and compile from fortran or c or c++ for most everything else.
Output all plots to postscript or eps.
LaTeX is scriptable.
And use shells, grep, perl to glue it all together. Did I mention those already?
I get a good night's sleep more often than not.
And decide what to do next after coffee the following morning.
This is where you put your brain, and if you have done the above well enough, this is where you spend most of your time.
The answer I get each morning (as another post suggests) is always so suprising that I need to start from scratch anyway.
I bet that is what you are doing already. Probably no monolithic software will be as efficient as that in a dynamic research environment.
What did I miss from your question?
Oh, yes. Get a ten-pack of computation notebook with 11 3/4 x 9 1/4 inch pages (if you print things with standard US letter paper). And lots of pens. And scotch tape to tape plots into that notebook. Laser printer and photocopier. Post-it notes to remind yourself what you wanted to do next (or e-mail memos to yourself). Maybe I should have listed this first.
Good luck.
So, where is the smart in shopping for a solution, while the work piles up? What proof is there someone hasn't already tried that route? I don't see any evidence he's head-down, butt-up in the meantime. It looks to me like he's just whineing for help.
easier to flag as a troll than it is to respond, right? Must be lazy sunday. I must have had a valid point after all.
You want the computer to run the experiment, catalog all the results and present them in a nice format. Maybe when it's done it can put your name on the results and publish it for you too.
;o)
Just Kidding
But if your determeined to let the computer do the work, perhaps some form of Genetic Algorithm could be applied here. If you can define you domain into something that can be broken down well enough and tested for selection criteria there are lots of tools and research available. If you have an API to work with like you said it shouldn't be too hard.
Of course converting it to a GA may take longer then your original experiments to implement.
Draft relational schema:
// date-time stamp // description
// auto-num // foreign key to experiments table
// auto-nump ; // "True" if from experiment // ASCII check-sum to make sure not changed
// foreign key to experiments table // foreign key to dataSet table
s ion
// foreign key to softwareVersion // foreign key to experiments table
Table: experiments
----
exprmntID
exprmntWhen
exprmntDescr
outcome
Table: params
----
paramID
exprmntRef
paramName
paramValue
Table: dataSet
----
dataSetID
filePath
datasetDescr
isGenerated&nbs
CRC
Table: dataSetUsed
----
exprmntRef
dataSetRef
Table: softwareVersion
----
svID
softwareTitle
svVer
Table: softwareVersionUsed
----
svRef
exprmntRef
Just use something like MySQL or MS-Access, and perhaps some kind of CRUD[1] tool to create front ends. You can expand from there based on new needs you encounter.
[1] CRUD = typical Create, Read (list), Update, Delete screens.
(Note: slashdot's filter scrambles certain variable names.)
Table-ized A.I.
It looks like you need - da da da da! - [b]EXTREME[/b] PROGRAMMING!
The features I would want would be:
management of all details of an experiment, including parameter sets, datasets, and the resulting data
This can be handled by an ad-hoc database, a flat file in most cases. If you were a Windows power user, you'd spend an hour or two putting together something in Access for it.
ability to "execute" experiments and report their status
make with a little scripting, or whatever you use as a build system.
an API for obtaining parameter values and writing out results (available to multiple languages)
additionally (alternately?) a standard format for transferring data (XDF might be good)
ability to extract selected results from experimental data
ability to add notes
Again, an ad-hoc database would be your friend.
ability to differentiate versions of software
This is conventionally handled with a configuration management system like CVS, Sourcesafe, or Clearcase.
I hate reinventing the wheel, too, and I'd love to see a good book on using standard free Unix tools like make, CVS, Postgres, perl or some other common scripting language, TeX, etc for cleanly and efficiently
automating complex computing processes and producing nice reports from them.
PAW and ROOT look interesting though they look like overkill for many apps.
Also, get a copy of Writing The Laboratory Notebook, some hardbound buffered laboratory notebooks, and Sakura 05 Pigma Micron archival pigment pens to keep your paper records. You'll thank me.
Working smarter only works if you're actually smarter. While it might be good for a laugh, asking ./ hardly qualifies as smart. So, if smarter isn't an option for you, you only have hard work to fall back on.
agree with you. prefer to do everything myself. am entering this using switches with led grid screen on homebrew z80 with tcpip stack i finished yesterday. expect to finish entire system in 8 more years. will be well worth it, sure am glad didn't waste time buying off shelf system!
You know what they say, if you want a job done fast, give it to a lazy man.
Again, if such a solution existed, it would already be in place. This whiner complained about the amount of work, that clearly comes with the job. He just wants to go home earlier...don't we all. Nothing smart in that.
Distribution of jobs, running things with multiple parameter values, etc., all can be handed smoothly from the shell. This is really the sort of thing that UNIX was designed for, and the entire UNIX environment is your "experiment management software".
I've been working on this, though it's not yet released as open source. If you'd like to try my system out it might speed up the release date. It's a web application written in python with a postgresql backend. Give me a ring at jmichelz at mail dot com for more details.
John
Its not strange, for example, for me to use python to generate the actual program runs, the shell to actually manage the run and move the input/output data files, then any of several graphics programs to handle the output (and often output graphs are done automatically as the programs run).
This gives me a pile of flexibility which is often useful. For instance, when doing stuff that might run for a while, I'll often sample things at widely spaced data points, then fill in the gaps. For things that are less long running, I'll just chomp through them all in order. It also allows me to rename input/output data files as things work, compress them when needed and so on.
Its also nice sometimes to be able to set things up to recompile the source with different numbers for efficiency instead of feeding the numbers to the executable on the command line or on stdin.
XML and XSLT are also becoming increasingly useful in describing input data, recording results and keeping track of things done.
Regarding the more general request for software that manages data, beats me. I do computer science research and I have asked myself many times if such software exists.
What suitable proprietary solutions did you find? I could not find any software (open or closed) that would properly manage bulk data.
I agree that defining the schema is a good place to start, and that a db backend is the "right thing" to do. For this app though, Postgresql offers some attractive features such as inheritance, stored procedures, and an eclectic set of datatypes.
Have you considering using python? researchers at DIKU have created a benchmark tool for . It seems usefull. Python is also excellent for gluing modules together. :)
Generally I would recommend against exporting directly to spreadsheet formats - I tend to export to flat files (with field separators). Postgres' COPY command is extremely handy from scripts. I also find gnu-plot handy. Remember .. these tools might seem un-intuitive at first .. but when you have used them for earlier experiments you can re-use code between experiements :)
I hope this helps
A wellthought example on how to setup your code for experimental work is the lemur toolkit from CMU. This toolkit has a concept of "parameter" files that is very handy
1. ... ... ...
2.
3.
conclusion: share your software, start a new project , see if other people are willing to help out.
Washington bullets will simply be known as the "Bulle
I'm one of the principal designers of a system called SMIRP.
/.'ers, but we got seriously screwed when the prototype we did in Cold Fusion became production code and we realised that Allaire (and later Macromedia) would not computer redistribution for less than 10,000 units. I could try to get it running on another CF implementation (I think there's some Blue Dragon or something) but honestly, I'd rather rewrite the whole thing.
It started out as a very simple system that didn't act as much more than a set of tables with some simple linking structures. On top of that is an alerting system, (so you can track new experiments being done) a full text index, bots for automating certain procedures, and a system for transferring data to Excel.
What's surprising is that for the most part, the underlying structure stayed exactly the same even though we've been running all the operations in an inorganic chemistry lab on for, oh, four years now. I've been chewing over ways of rewriting it because, honestly, it's still the same prototype. I'd love to go with an all Perl solution... but the damned thing just works and I have other stuff to do.
Some lessons I've learned, problems I've run into:
A general interface. You really need a flexible structure because scientists never know what parameters they're going to use until they do the experiment. Our big success has been such a simple structure that people can throw a SMIRPSpace together in minutes.
Browser based interface. It's great because it's ubiquitous, but it's painful because of the inflexibility of forms. One big win with it is that you can get a horde of workstudies to form a pipeline. For example, a grad student might put a request in the system for an article, a workstudy recieves a notification of the change and hits the web to fill in details, another then gets notified and sends a request to the library, another gets notified and scans the result and finally the grad student sees a scanned copy of the article.
Excel based interface. It's great because people can play with data, but it's Excel...
XML is garbage. There's nothing you can do in XML that you can't do better with a flat file + regexes, or a SQL DBMS. XML is utterly, completely worthless.
Proprietary products. This won't be a huge surprise to
Reporting. This is *hard* to do. We still don't have any serious system for handling reports beyond "import the data to Excel and do it manually."
Funding agencies in the USA (NSF, NIH) and Europe have recently decided to target the construction of such software, and many competing projects have been given grants, most of which involve the production of open source software.
Relevant keywords are "eScience", "Experimental Data Management", "Experimental Metadata", and to some extent "Grid Computing".
Here is a paper which lays out the program of research.
I work for one such NSF & NIH funded project at Dartmouth College. We're developing such a tool : Java-based, completely open, available at sourceforge, currently in alpha, to be released for fMRI use in July, but designed from the start to be generalizable for all of experimental science. This is built on top of a pre-existing framework for semantic data management and modeling from Stanford.
I'll try to list some of the features relevant to your needs:
Finally, I would like to stress that our project is one of many, and that if it doesn't meet your needs, within a year there will be many competing "eScience" toolkits.
You may contact me for more information by reversing the following string: "ude.htuomtrad@exj".
The Computer Aided Engineering (CAE) world has much the same problem you do.
They model their products with several different analysis codes, each with its own input and output format. This generates a gob of data, and is currently managed in ad hoc ways, is not easy to integrate with other results and wastes the time of lots of engineers.
The product we've come up with to manage both the models, the process for executing the models, and the data generated by running the models is a software framework called CoMeT (Computational Modeling Toolkit).
We are also capable of managing different versions of the model, parameter studies, and some basic data mining. The whole thing is scriptable with Scheme.
Unfortunately, we are a commercial software company, and the software is still under development, although everything I mentioned above can currently be done. We are mostly working on a front end now, although we still need to make a few improvements to the framework and add support for many analysis codes.
The reason I'm replying to this is that your list of requirements is a perfect subset of ours. We are aiming our product at CAE in the mechanical and electrical domains (Mechatronics).
I know, it's not free, but we feel we've done some very innovative things and it has taken several people many years of low pay to get this far. We really want to make some money off it eventually....
If you want more information check out the web-site or email me here. We're in need of proving this technology in a production environment so maybe we can work something out.
-Craig.
http://roofit.sourceforge.net/
more experienced programmers.
I think someone told all the computer scientists that there's a theoretical way to write a program that does everything. I'm a computer scientists and it's clearly impossible to generalize that greatly.
In fact it is so dangerous, the general purpose OS is the root cause of our software is the number one cause of our downward spiral in software quality. It took BeOS a short time to write their general purpose OS. I think it's silly to think it's that monolithic of a project that it must be maintained over several decades of development. We should be developing a fresh OS for individual applications more as a matter of principle than what might be called arrogance today to go against one of the current bastions of computing like Linux or MS Windows.
Postgresql offers some attractive features such as inheritance, stored procedures, and an eclectic set of datatypes.
I never was much of a fan of schema inheritance. Most examples I have seen were based on bad designs IMO. And funky datatypes decreases porting and sharing of the data to other DB's.
Plus, I think they wanted something "lite" in the DB department based on one comment, and Postgre has a bit more of a learning curve.
Table-ized A.I.
I'm in precisely the same situation as Alea, so I read the suggestions here with considerable interest.
I'd like to mention ExpLab.
Though I haven't used ExpLab yet, these folks have been associated with other very high quality work (CGAL) so I expect good things. Here are three goals they list for the project:
The high energy folks also have a similar set of packages (as other nuclear labs probably do).
Galium Arsenide is the material of the future, and always will be.
Ralf
The trouble with the world is that the stupid are cocksure and the intelligent are full of doubt.
-Bertrand Russel
Stop! Don't re-invent that wheel! The work - has already been done... The system you desire was developed at U C Berkeley nearly ten years ago and has been in production at places like JPL and the Langley Research Center since 1997. This software contains all the features you desire and a lot more, including a fully distributed processing system, a collaborative distribution system, archive hooks and even has robust security features. And, conveniently, it doesn't contain any of the features you said you don't want! See http://ScienceTools.com/ (or send me an email) for more information. Regards, Richard
Richard Troy, Chief Scientist Science Tools Corporation rtroy@ScienceTools.com, 510-567-9957, http://ScienceTools.com/