Slashdot Mirror


Open Source Experiment Management Software?

Alea asks: "I do a lot of empirical computer science, running new algorithms on hundreds of datasets, trying many combinations of parameters, and with several versions of many pieces of software. Keeping track of these experiments is turning into a nightmare and I spend an unreasonable amount of time writing code to smooth the way. Rather than investing this effort over and over again, I have been toying with writing a framework to manage everything, but don't want to reinvent the wheel. I can find commercial solutions (often specific to a particular domain) but does anyone know of an open source effort? Failing that, does anyone have any thoughts on such a beast?"

"The features I would want would be:

  • management of all details of an experiment, including parameter sets, datasets, and the resulting data
  • ability to "execute" experiments and report their status
  • an API for obtaining parameter values and writing out results (available to multiple languages)
  • additionally (alternately?) a standard format for transferring data (XDF might be good)
  • ability to extract selected results from experimental data
  • ability to add notes
  • ability to differentiate versions of software
In my dreamworld, it would also (via plugin architecture?) provide these:
  • automatically run experiments over several parameters values
  • distribute jobs and data over a cluster
  • output to various formats (spreadsheets, Matlab, LaTeX tables, etc.)
Things I don't think it needs to do:
  • provide a fancy front-end (that can be done separately - I'm thinking mainly in terms of libraries)
  • visualize data
  • statistical analysis (although some basic stats would be handy)
The amount of output data I'm dealing with doesn't necessitate database software (some sort of structured markup is ok for me), but some people would probably like more powerful storage backends. I can see it as experiment management 'middleware'. There's no reason such software should be limited to computer science (nothing I'm contemplating is very domain specific). I can imagine many disciplines that would benefit."

122 comments

  1. MAUS and GABE by fozzy(pro) · · Score: 0, Insightful

    MAUS roxors.

    In soviet russia MtnDew Buys Gabe.

  2. Experience by robbyjo · · Score: 4, Insightful

    I also did lots of comp sci empirical experiments. My experience is that the tools used for experimenting itself is very ad-hoc and not easily scriptable. Most of the times we are required to tend the hour-long experiments to see what happened on the output and decide what to do next. And... the decision is often times not clear cut. Some sort of heuristic is needed. Not to mention about the frustations when the errors occur (especially when the tool is buggy, which is very often in research settings). So, considering this, what I would do is to construct a script and do the experiments in phases. Run it and see the result several days after.

    I also noticed that from one experiment to another is sometimes radically different that I would doubt it is easily manageable.

    --

    --
    Error 500: Internal sig error
    1. Re:Experience by jkauzlar · · Score: 3, Interesting
      I agree with the parent post after giving the problem a little thought. There may tools available, but I think what you need is to set up scripts for your experiments.

      What comes to mind when I think about experiment management software is unit testing software. Correct me if I'm wrong, but when you run empirical software experiments, you are essentially unit testing the software.

      Something like Python, Perl, or TCL (probably Python-- powerful, easy to read) should suit you ideally. Other options include Make utilities like make or Ant (w/ JUnit would work great!).

      With any of these you could make use of any existing command-line or scriptable utilities for conversion or producing data files or database data.

      Just my 2 cents. Hope this helps.

    2. Re:Experience by Alea · · Score: 1

      Writing scripts is my current solution and my desired solution would probably require scripts as glue to bind various applications to the management system. However, this would lead to much less work dedicated to each project.

    3. Re:Experience by robbyjo · · Score: 3, Interesting

      Sorry, but I must disagree. Most of the times, research experiment != unit testing.

      To illustrate: Take for example a data mining project. The first phase is data preparation -- which is easily scriptable. But how to prepare the data is different story. We must examine the raw data case by case to decide how to treat it. For example: When to discretize and using what method (linear scale, log scale, etc), when to reduce dimensionality, etc etc. This requires human supervision.

      Even after we do the data prep, we look at the result. If the cooked data contains too much loss of information due to prep stage, we have to do it again using different parameters. This is painful.

      Then, next on the pipeline: What algorithm to use. This is, again, depend on the characteristics of the cooked data. You know, some "experts" (read: grad students) will determine it using some "random" heuristics of their mind given some reasonable explanations.

      If after the result is out and is not desirable, we might go back for different algorithm or choose different data prep parameters, and so forth...

      Given this settings, I doubt that there is a silver bullet for this problem...

      --

      --
      Error 500: Internal sig error
    4. Re:Experience by Anonymous Coward · · Score: 0

      You left Ruby off the scripting language list.

      Alea might be able to use RCS in some way, too.

    5. Re: Experience by Black+Parrot · · Score: 1


      > You left Ruby off the scripting language list.

      Oh, the Humanity!

      --
      Sheesh, evil *and* a jerk. -- Jade
  3. Object Modeling System by Anonymous Coward · · Score: 5, Informative

    Take a look at the object modeling system. It is currently being developed by Agricultural Research Service but many other agencies are cooperating.

    http://oms.ars.usda.gov/

  4. How about this by captainclever · · Score: 1

    You could look into providing some kind of web-services feel to the computation, and then use an open-source provenance server.

    A provenance server might handle the recording of queries, results etc. Not sure how many good open source ones there are.

    --
    Last.fm - join the social music revolution
  5. Piracy is Your Only Option by use_compress · · Score: 4, Funny

    1. You cannot (?) afford commercial software.
    2. It is impractical for you to continue writing your own software.
    3. You cannot find open source software.
    -------
    Conclusion: Steal commercial software! -)

    1. Re:Piracy is Your Only Option by Anonymous Coward · · Score: 0

      People like you are the reason excellent commercial software is expensive.

    2. Re:Piracy is Your Only Option by Anonymous Coward · · Score: 0

      AC, -) means tongue-in-cheek.

    3. Re:Piracy is Your Only Option by NewbieProgrammerMan · · Score: 2, Insightful
      must...resist....

      4. Profit!!!!

      Sorry, I've been reading slashdot too much and must append such an item to all lists I encounter. :P

      And it's not stealing, it's copyright infringement. ;)

      Seriously, though, I think using commercial software still won't cover all the bases. Alea said, "I can find commercial solutions (often specific to a particular domain)..." which I would assume means that there don't appear to be any general-purpose experiment packages.

      As some others have already posted, 'experiments' can cover a wide range of things, and I can imagine that making a general-purpose experiment harness would be a tall order. Having such a thing would be useful for some of the work I do, but I have not had the time (and probably don't have the ability) to try to put together something that can help manage and automate experiments (or sensor data processing jobs, in my case). This is one of those problems which I 'feel' has to have a solution, but I know it's currently beyond my capability to figure out how it should work.

      --
      [b.belong('us') for b in bases if b.owner() == 'you']
  6. Perl by Anonymous Coward · · Score: 1, Insightful

    Sounds like you need to use Perl.
    I find it to be an excellent language for maintaining data.

    1. Re:Perl by Anonymous Coward · · Score: 0

      The original question is asking for help in making the managment easier. I don't think applying tools that will come with their own huge managment and maintainence issues will help. That puts PERL out of the running.

  7. dependencies (but not make) by Anonymous Coward · · Score: 3, Interesting
    I'm also an empirical computer scientist, and another aspect I would look for is handling dependencies. Make is the standard tool for doing this, but it's not up to this task.

    Ideally, I'd type make paper and it would start from the beginning stages of the experiment and go all the way through creating the paper. Moreover, if anything died along the way, I could fix the problem, type make again, and it would more or less pick up where it left off, not re-running things it had already done (unless they were affected by my fix).

    But after playing with this for a few days, I became convinced that make wasn't up to snuff for what I wanted. I have these sort of `attribute-value' dependency constraints. From one raw initial dataset, I create several cross-validation folds, each of which contains one training set and a couple varieties of test set. the filenames might look like

    base.fold0.testA.test base.fold0.testB.test base.fold0.train
    Now suppose that the way I actually run an experiment involves passing a test set and the corresponding training set to the model I'm testing, a command like:
    modelX base.fold0.testA.test base.fold0.train > base.modelX.fold0.testA.run
    Since, however, I have to run this over several folds (and other variations that I'm glossing over), I'd like to write an 'implicit rule' in the Makefile. This involves pattern-matching the filenames. But it's a very simple pattern-matching: you get to insert one .* (spelled %) in each string, which corresponds to the same thing. Given that, there's noway I can specify the command I have above.

    You might be thinking, you could do

    %.modelX.testA.run : %.testA.test %.train
    but then I have to copy this rule several times for each sort of test set, even if the command they run is the same.

    The underlying problem, I think, is that the pattern-matching in make's implicit rules is too simple. What I would rather have is some kind of attribute-value thing, so I could say something like

    { fileid=$1 model=modelX test=$2 filetype=run } : {fileid=$1 test=$2 filetype=test } { fileid=$1 filetype=train }
    where fileid corresponds to 'base.fold0' and whatever other file identifying information is needed.

    This notation is sort of based on a natural language attribute-value grammar.

    Anyway, if anyone has any suggestions as to this aspect of the problem, I would be grateful

    1. Re:dependencies (but not make) by hysterion · · Score: 1


      Have you tried automake? (autotut, autobook)

    2. Re:dependencies (but not make) by Anonymous Coward · · Score: 0

      Look into a make replacement called 'Jam'
      http://www.perforce.com/jam/jam.html
      Cryst alSpace's latest CVS is moving over to it, and having looked over both Jamfiles and Makefiles you can avoid a lot of cruft using jam.

      It's still not as featureful as make (my pet peeve is lack of -f support), but it's dependency generation is nice.

      -- vranash

    3. Re:dependencies (but not make) by Anonymous Coward · · Score: 4, Interesting

      I ran into this problem when I was in graduate school, too. What I eventually did was to abandon make because of the limitations you are running into, and construct a special-purpose experiment running utility that would know about all the predecessors, etc. It turned out not to be too hard, actually. However, if you don't know perl or another language that gives you good pattern matching and substring extraction capability, then this will be very hard to do.

      I just wrote two functions. (I wrote them in the shell, but if I were doing it again, I'd probably do it in perl.) construct() simply makes a file if it is out of date (see example below). Construct() is where are of your rules go: it knows how to transform a target filename into a list of dependencies and a command.

      It uses a function called up_to_date() which simply calls construct() for each dependency, then returns false if the target is not up to date with respect to each dependency. If you don't do anything very sophisticated here, up_to_date will only be a few lines of code.

      "construct" will basically replace your makefile. For example, if you did it in perl, you could write it something like this:

      sub construct {
      local $_ = $_[0]; # Access the argument.

      if (/^base\.model(.)\.fold(\d+)\.test(.).run$/) {
      @dependencies = ("base.fold$2.test$3.test",
      "base.fold$2.train");
      if (!up_to_date($_, # Output file.
      @dependencies, # Input files.
      "model$1")) { # Rerun if prog changed, too.
      system("model$1 @dependencies > $_");
      }
      }
      elsif (/^....$/) { # .. Check other patterns. ...
      }
      }

      What you've gained from this is a much, much more powerful way of constructing the rule and the dependencies from the target filename. Of course, your file will be a little harder to read than a Makefile--that's what you pay for the extra power. But instead of having many duplicate rules in a makefile, you can use regular expressions or whatever kind of pattern matching capability you want to construct the rules.

    4. Re:dependencies (but not make) by Anonymous Coward · · Score: 0

      have you tried ant ?

      http://apt.apache.org

  8. R? by Elektroschock · · Score: 4, Informative

    Did you consider R, a Splus clone? For Scientific Statistics a very flexible solution. http://www.r-project.org

    1. Re:R? by Anonymous Coward · · Score: 0
      Original Poster: I am trying to streamline an incredibly complicated process, with many varied subtleties, and I want to add tons of features that integrate with and manipulate aspects of the process.

      You: I like ice cream.

    2. Re:R? by Elektroschock · · Score: 1

      Troll! R is professional software. I does not compare with SPSS which is the easy use solution. As a command line language R is easy scriptable, there are many extensions.

    3. Re:R? by Anonymous Coward · · Score: 0

      I think the point of the retort is that your solution is not applicable to the problem. The original poster isn't running statistical experiments, and the experiment isn't really about running the same algorithm on different data sets--it's about comparing the performance of different algorithms. Thus S-plus and SPSS probably wouldn't be useful for this problem, seeing as how computing the statistics of a single sample point is trivial.

    4. Re:R? by Elektroschock · · Score: 1

      The unix philosophy is: use different command line tools and scripts. R may be one those. There will be no tool that solves the whole problem.

  9. Ant, with some tweaking. by Xerithane · · Score: 4, Interesting
    We do something that almost parallels this, and we still haven't had the time to complete the Ant setup. The basic gist of it is that Ant has properties files that can contain any number of parameters, along with embedded XSLT functionality. This allows Ant to generate new build.xml files (The Ant build file) and run it, on the fly, given a set of user-entered commands, environment variables, or file parameters. The parameter files are easy to modify and update, and combined with CVS you can even do version control on the different experiments.

    What I would end up doing is setup an Ant build file for each experiment, under each algorithm.

    Algorithm/experiment_dataset1.properties
    Algori thm/experiment_dataset2.properties

    And then you can update property files, using a quick shell script, or something along those lines at the end of the data set, as well as having build/run times that Ant can retrieve for you. Good solution, and you aren't reinventing the wheel.

    Requires Java, which depending upon your ideology is either a good thing or a curse. :)
    --
    Dacels Jewelers can't be trusted.
  10. Idea by dubbayu_d_40 · · Score: 1
    A testing tool like JUnit might be a good place to start. I suspect you'll be writing most of your solution. Put it on sourceforge if you do, this sounds useful.

    Good Luck

    1. Re:Idea by dubbayu_d_40 · · Score: 1
      The ant idea above is awsome. Maybe ant + junit...

      To easily store arbitrary datastructures, try xl2 serialization (java).

  11. AppLeS? by kst · · Score: 2, Informative

    Something like the AppLeS Parameter Sweep Template software might suit your needs. I've never used it myself, but it looks like it might be close to what you're looking for.

    See here for other projects from the GRAIL lab at SDSC and UCSD.

  12. Uh-huh by Ryvar · · Score: 4, Funny

    I don't mean to sound cynical, but this seems to come across to me as a very nicely written:

    Ne3D H3lp WIt M4H H4x0RiN!!!!!

    I mean, let's face it, much of what modern hacking closed-sourced software consists of is throwing a variety of shit against a variety of programs in a variety of configurations and seeing what breaks and then following up to make an exploit out of it.

    While this probably isn't the case here, it's very hard to read that note and not snicker just a tiny, tiny bit . . .

    1. Re:Uh-huh by BitHive · · Score: 1

      What? Re-read the original story. You are wayyyyy off the mark.

    2. Re:Uh-huh by DarkFyre · · Score: 1

      I don't think so...

      Or at least, I read the story and immediately thought of applications to my own projects (I'm a Research Assistant at my University, and I'm a little tired of writing Perl scripts to batch long jobs with combinatorial arguments).

      If such a tool exists, I too would be interested in it. I think it rash to assume that the poster is looking for exploit automation.

    3. Re:Uh-huh by Anonymous Coward · · Score: 0
      Wow, you're quite the scientist aren't you. if
      running new algorithms on hundreds of datasets, trying many combinations of parameters, and with several versions of many pieces of software.
      implies h4x0rin to ya before anything else, then you're got a pretty limited range of experience. Get back behind that unix box and finetune mah mail server biatch.
    4. Re:Uh-huh by Anonymous Coward · · Score: 0

      Why yes. Of course. You've come to the obvious conclusion, which may be summed up as:

      Technology cuts both ways.

      You haven't been here long. Obvious references in the past include: farming, guns, knives, open source code, code in general, security and exploit release times, bouncing radio waves off of distant objects for message delay but absolute delivery.

      Radar--you can use it find stuff to blow up, do a bombing run, or to prevent accidents. Ever hear of the Nobel Prize(s)? Practical explosives testing. Gee, use it to mine, move, or remove, whether that be dam building or doing something insane like McVeigh.

      OF COURSE something like this could be coverted, construed, etc. and be used to hack. Hacking itself can be good or bad. But given the nature of the inquiry and its delivery, I highly doubt this was the asker's underhanded purpose. And, those in the know already, don't need an open source project to hammer against code (closed or open[1]) to test it--they just do it and probably like their own crafted methods.

      [1] Theo remarked on a little code snippet that appeared on bugtraq 3 years or so back. Basically, it threw garbage at code. Not entirely useful when you think about it, except that it worked wonders on closed source material. The snippet, if I recall, was submitted as a sort of benchmark one could use to perhaps compare how good the code was, of both open and closed source programs (you test it against the binary or what have you, obviously). Sorta like an md5 checksum but instead of data, it flexed the internals.

      Despite audited code, Theo came across 3 or so bugs, which he openly commented to the list on (which sold me on OBSD doing things more right re security and testing, including discussion).

  13. Oh that's easy.... by Anonymous Coward · · Score: 5, Funny
    The Academic Community, especially those strange AI people, have long sought complicated programs and machinery that could automate all of their work and projects, keep track of complicated "parameter sets, datasets, etc....".

    But what you are looking for, sir, is the cheap labor commonly known as a Graduate Student
    • Many of these "grads" [as they are commonly known] have INDEED been able to " 'execute' experiments and report their status", as well as "writing out results (available in multiple languages)".
    • The Graduate Student is often known for their abilities to create and distribute notes in lieu of bringing that onerous burden upon more high-ranking academic officials
    • ...you don't even have to dream about doing "clustered work" or "outputing results to spreadsheets, Matlab, LaTeX tables, etc....". These fancy machines can definately do that...
    • Of course, there are several "graduate students" that provide a fancy front end (and rear end, for that matter). I think that I would agree with your assesment that they do not need to have that feature, although it might make your days a bit more... ermm... *pleasant* :-)
    • As well, most graduate students have the capability of performing "basic stats", although most don't have an extensive faculty for performing such calculations...
    • And don't you even worry about the price -- you'll see that they're quite affordable.
    To conclude, you say that "There's no reason such software should be limited to computer science (nothing I'm contemplating is very domain specific). I can imagine many disciplines that would benefit". I would wholeheartedly have to agree with you: just about every discipline can do more and see farther by standing on the backs of their graduate students.
    In fact, I'm afraid to report that you are a bit behind the times in this department as these "Graduate Student" devices are quite common at universities and research labs.
    1. Re:Oh that's easy.... by Alea · · Score: 2, Funny

      Ah, you see... there's the problem... I am, in fact, a cheap alternative to the much vaunted "Graduate Student". I'm a "Lazy Graduate Student" (TM), with slow update rates, poor accuracy, and long downtimes. Eventually, I'll probably break down completely into a "Professor", in which case someone will have to find some "Graduate Students" to get the work done...

    2. Re:Oh that's easy.... by quantaman · · Score: 4, Funny

      Of course, there are several "graduate students" that provide a fancy front end (and rear end, for that matter). I think that I would agree with your assesment that they do not need to have that feature, although it might make your days a bit more... ermm... *pleasant* :-)

      That does have it's advantages though you should be cautious. In my experience those models often have a large number of bugs in their systems and tend to be a lot more likely to pick up viruses as well.
      This shouldn't be a problem for most operations but ocassionally if you try to interface them with your other components you may find your other systems becoming infected as well. In extreme cases you may also find interfacing with these systems can cause additional child processes to be created. These child processes are extremely hard to get rid of, early on you may be able to simply kill them but this command becomes extremely impratical after a few months of operation. These processes are known to take up huge amounts of resources and maintainance and often take the better part of 2 decades to subside (they're still present but resource demands drop considerably). Of course many of these risks can be alliviated by using a proper wrapper class while working with this "graduate student" systems.

      --
      I stole this Sig
    3. Re:Oh that's easy.... by Anonymous Coward · · Score: 0

      Use the new RU486 processor.

    4. Re:Oh that's easy.... by drauh · · Score: 1

      You forget the primary advantage of graduate students: they can be programmed in natural language.

      --
      This is a tautology.
  14. MS ha by mikeclark · · Score: 0, Troll

    Use Excel, sorry Im a jerk....

  15. And when you ask them the answer will be... by Anonymous Coward · · Score: 0

    YOU FAIL IT

  16. Re:Welcome by skraps · · Score: 1

    The rumor is, it's something called **work**.

    From the OP: I have been toying with writing a framework to manage everything, but don't want to reinvent the wheel.

    Seems to me that the OP is more than capable of doing the work, but he is smart for trying to find an existing solution. The rumor is, it's something called **working smarter**, not **working harder**. :-)

    --
    Karma: -2147483648 (Mostly affected by integer overflow)
  17. ROOT by kenthorvath · · Score: 4, Informative
    http://root.cern.ch/

    We experimental high-energy physics folk have been using it (and PAW) for some time. It offers scripting and histogramming and analysis and a bunch of other features. And it's open source. Check it out.

    1. Re:ROOT by gowdy · · Score: 1

      BTW, I posted a link to an addition to ROOT that probably makes it a little more suitable (http://roofit.sourceforge.net/).

  18. suggest jdb for managing individual experiments by john_heidemann · · Score: 4, Informative

    I've been very happy using jdb (see below) to handle individual experiments, and directories and shell scripts to handle sets of experiments.

    JDB is a package of commands for manipulating flat-ASCII databases from shell scripts. JDB is useful to process medium amounts of data (with very little data you'd do it by hand, with megabytes you might want a real database). JDB is very good at doing things like:

    • extracting measurements from experimental output
    • re-examining data to address different hypotheses
    • joining data from different experiments
    • eliminating/detecting outliers
    • computing statistics on data (mean, confidence intervals, histograms, correlations)
    • reformatting data for graphing programs

    For more details, see http://www.isi.edu/~johnh/SOFTWARE/JDB/.

  19. A project with similar goals by Anonymous Coward · · Score: 1, Informative


    http://sourceforge.net/projects/pythonlabtools/

  20. Perl is only useful for maintaining your job by Anonymous Coward · · Score: 0
    Why would you want to maintain your data with unmaintainable code, unless simply maintaining your job was more important than what you're doing with the data.

    Perl is great for making seemingly complex busy-work out of trivial problems, and constructing arcane rube goldberg devices that nobody can understand, so they can't fire you without throwing away the software.

    Perl is the ultimate programming language for corporate leaches!

    But if you actually want to solve problems an get work done, use another language than Perl.

    1. Re:Perl is only useful for maintaining your job by asciirock · · Score: 5, Funny

      Just admit it. Perl slept with your wife. That's what this is really about, isn't it?

    2. Re:Perl is only useful for maintaining your job by Anonymous Coward · · Score: 0

      kicked my dog, too

    3. Re:Perl is only useful for maintaining your job by Starky · · Score: 1
      Your take on Perl smacks of uninformed biases. No offence.


      I have used it extensively for research projects (most of my work involves nonlinear optimization models), gluing together disparate applications and sources of data, and it has worked splendidly.


      I also use C when it is appropriate and Java when it is appropriate. Frankly, Perl has time and again proven its worth and has been (for me) more often than not the right choice.


      As you say, Perl syntax is looser than more strongly typed languages, but most spaghetti is a result of a poor programmer, not the language. Perl gives you enough rope to hang yourself if you're a hack, but in the hands of an experienced programmer, it is a wonderful tool.


      As a scripting language, it seems to suffer pedants such as yourself as the language that "serious" programmers would not use.
      But my experience is that it has been the best choice for many of my projects and has saved me countless hours and has allowed me to focus more on my research than I would have been able to with another language such as Java, C, or C++.

      --
      -- My choice of computing platform is a symbol of my individuality and belief in personal freedom.
    4. Re:Perl is only useful for maintaining your job by PugMajere · · Score: 1

      Perl gives you enough rope to hang yourself if you're a hack

      Funny, C gives you enough rope to shoot yourself.

    5. Re:Perl is only useful for maintaining your job by CreatorOfSmallTruths · · Score: 1

      so, saying "constructing arcane rube goldberg devices" is the latest buzz word, huh ?

      admit it, you don't know what all of those "$"s and "@"s mean, and you are afraid of them... they might sneak up on you in the night and make you think....
      Perl is great for quick and dirty hacks, bugzilla was written using perl, which says a lot, I know of corporate projects written (and work!) in perl.
      So you heard someone says "this is an unmaintainable language" and from there on you chatter the mantra..
      And what other languages?? I write C and C++ (the fastest languages out there, as far as I know, except assembly) - are they more maintainable than perl? no. why not? because for each line in perl I need to write 10-15 lines of code in C and about 10 lines in C++... which means the complexity level is much higher, therefor - more maintenance is needed. why do people use C/C++ ? because its fast.
      So , what other languages? JAVA - first let them implement templates. Scheme/Prolog/Lisp - find an expert in one of those and then try to replace HIM... Visual Basic? unmaintainable, not to mention proprietary. Delphy? A language used by too few, not widely excepted. PHP/ASP/whatever ? yeah, try maintaining those...
      A wise man once said, there is no silver bullet. there is no bug free computer language.

  21. Open-Source-Experiment Management-Software? by MMHere · · Score: 1

    With the slight re-grouping of the title phrases as above, I think we can all agree the answer is:

    FBI's Carnivore.

    (Well, that's the way the headline parsed out for me the first time I glanced at it...)

  22. Fun with your new head! by SimHacker · · Score: 1
    Be sure to get a fun new head to go with your grad student:

    http://catalog.com/hopkins/text/head.html

    --
    Take a look and feel free: http://www.PieMenu.com
  23. Define a common meta-data set. by rdewald · · Score: 1

    You seem to suggest that the specifics of the software used in the experiments themselves is too varied and engineered to respond to object management within the native environment.

    Ok, you take the management piece into a meta-environment like web e-commerce. Each iteration produces a transaction, essentially a line in a table containing the common meta-elements and then you perform your management via linked queries on this data set ala Napster.

    If all of your data engines are connected (Intranet), the only thing that needs to be centralized is the knowledge of what is where.

    So, you build on the code from one of the open-source e-commerce engines and combine that with the code elements from one of the open peer-to-peer management Napster colnes.

    Since the code is OPEN, you can do this.

    --
    The best way to do is to be.
  24. Re:Satania is a good choice. by missing_boy · · Score: 0, Troll

    fuck, you're gross!

  25. Re:Satania is a good choice. by marcushnk · · Score: 1

    Score 1 informative?!?! This is a goatse link guys and grrls.. don't click it...

    --
    "Consider how lucky you are that life has been good to you so far. Alternatively, if life hasn't been good to you so far
  26. tcltest by trb · · Score: 2, Informative

    It might not satisfy all your requirements out of the box, but could you put something together with tcltest?

  27. Re:Satania is a good choice. by abirdman · · Score: 1

    How did a link go Goats.cx get modded informative? Should I infer that moderators don't follow links and just read buzzwords? A very successful troll, but gross, gross, disgusting. Don't click the link.

    --
    Everything I've ever learned the hard way was based on a statistically invalid sample.
  28. Eclipse as testing platform by Daniel_ · · Score: 1

    It might take some work, but Eclipse from IBM has improved a great deal towards becoming a good environment for project management. Its geared towards projects written in Java, but there is a C/C++ Perspective plugin if you prefer...

    Its a good platform for managing a collection of custom ant build scripts if you decide to go that direction (assuming your in java of course...)

    If you'd prefer something more specialized, the plugin architecture isn't bad and could save some time with interface work. Especially since any windows from other perspectives that you like can be dropped directly into your custom-built perspective.

    Food for thought...

    www.eclipse.org

    --
    The number you have dialed is imaginary, please rotate your phone 90 degrees and try again.
  29. Need Open Source data reduction too... by earthforce_1 · · Score: 1

    I was able to do almost everything on my thesis using open source tools, LaTeX on Linux, except when it came to data reduction - I was forced to use the crippled student version of SPSS. I would love to see a GNU clone of this functionality the way they have cloned Matlab.

    --
    My rights don't need management.
    1. Re:Need Open Source data reduction too... by matfud · · Score: 1

      Try R an open source SPSS clone. I found it very useful for my research. Esp. drawing custom graphs.

      Matfud

    2. Re:Need Open Source data reduction too... by foog · · Score: 1

      R is a clone of S, not SPSS. The FSF does have PSPP, but it doesn't do very much yet.

  30. Advice... by Anonymous Coward · · Score: 0

    ...it sounds like you don't know what the hell you are doing. Now would be a good time to start investigating a new career.

  31. Re:Sad news, Earl King dead at 69 by Anonymous Coward · · Score: 0

    NEW ORLEANS (AP) Earl King, the prolific songwriter and guitarist responsible for some of the most enduring and idiosyncratic compositions in the history of R&B, died Thursday from diabetes-related complications. He was 69.

    Over his 50-year career, King wrote and recorded hundreds of songs.

    His best-known compositions include the Mardi Gras standards ''Big Chief'' and ''Street Parade''; the rollicking ''Come On (Let the Good Times Roll),'' which both Jimi Hendrix and Stevie Ray Vaughan recorded; and ''Trick Bag,'' the quintessential New Orleans R&B story-song.

    '''Come On (Let the Good Times Roll)' might be the one that people know, but I wish the world would hear more of his songs,'' said Mac ''Dr. John'' Rebennack, a longtime friend, fan and collaborator of King. ''He approached songs from different angles, from different places in life.''

    In his prime, he was an explosive performer, tearing sinewy solos from his Stratocaster guitar and wearing his hair in an elaborate, upraised coif.

    King's songwriting was informed by syncopated New Orleans beats and his interest in a broad range of subjects, from medieval history to the vagaries of the human heart and his own so-called ''love syndromes.''

    ''Most people say, 'Well, Earl, you sing the blues,' or however they want to categorize it,'' King said in a 1993 interview. ''I just sing songs. I'm a writer, so whatever gymnastics jump through my head, I write about it.''

    Born Earl Silas Johnson IV, King described himself as a ''nervous energy person'' who constantly needed to be engaged in some creative pursuit.

    He cut his first singles in the early 1950s, taking on the stage name ''Earl King'' at the suggestion of a record promoter.

    Scenes and acquaintances from his life often found their way into his lyrics with little editing. A story King's grandmother told about his father, a blues pianist who died when King was a boy, inspired ''Trick Bag.''

    In the song, the protagonist sings to his wayward significant other, ''I saw you kissing Willie across the fence, I heard you telling Willie I don't have no sense/The way you been actin' is such a drag, you done put me in a trick bag.''

    Funeral arrangements had not been finalized late Friday evening.

  32. Sounds like High Energy Physics by Anonymous Coward · · Score: 4, Informative

    What you describe does indeed sound like High Energy Physics.

    And the "middleware" you need are the GNU tools gluing together the specialized programs that do the specific things you want.

    We have been using unix for a long time, and many of us prefer the combination of small targeted tools philosophy rather than a single monolithic package.

    I will repeat, and you can stop reading now if you want. The GNU tools, unix, and specialized scriptable programs are already the "middleware" you seek.

    If you are just missing some of the tools in the middle, here are the ones used in HEP. You might find more appropriate ones closer to whatever discipline you work in.

    All the basic unix text processing tools and shells.
    bash. csh. Perl. grep. sed. and so on.

    Filename schemes ranging from appropriate to clever to bizarre.
    (See other posts here)

    Make it so that all the inputs you want to change can be done on the command line or with an input steering text file.

    Same tools combined with some simple c-code to produce formats for spreadsheets or PAW or ROOT or whatever visualization or post-processing thing you need done. Has ntuple and histogram support automatically, which might be all you need.

    Almost always I choose space delimited text for simple output to push into PAW, ROOT, or spreadsheets. I keep a directory of templates to help me out here.

    Some people use full blown databases to manage output. For a long time there have been databases specific to the HEP needs. I recently have started using XML-style data formats to encapsulate such things in text files if the resulting output is more complicated than a single line. You mention XDF, sure, that sounds like the same idea.

    CONDOR (U Wisconsin) has worked nicely for me for clustering and batch job submission when I need to tool through 100 data files or 100 diffrent parameter lists on tens of computers. The standard unix "at" is good enough in a pinch if you play on only 5 computers or so.

    HEP folks use things like PAW and ROOT (find them at CERN) which contain many statistical analysis things and monstrous computation algorithsm. Or at least ntuples, histograms, averages, and standard deviations. You could go commercial or the gsl here if you prefer such things.

    CVS or similar to take care of code versions.
    Don't forget to comment your code.

    We write our own code and compile from fortran or c or c++ for most everything else.

    Output all plots to postscript or eps.

    LaTeX is scriptable.

    And use shells, grep, perl to glue it all together. Did I mention those already?
    I get a good night's sleep more often than not.

    And decide what to do next after coffee the following morning.
    This is where you put your brain, and if you have done the above well enough, this is where you spend most of your time.
    The answer I get each morning (as another post suggests) is always so suprising that I need to start from scratch anyway.

    I bet that is what you are doing already. Probably no monolithic software will be as efficient as that in a dynamic research environment.

    What did I miss from your question?

    Oh, yes. Get a ten-pack of computation notebook with 11 3/4 x 9 1/4 inch pages (if you print things with standard US letter paper). And lots of pens. And scotch tape to tape plots into that notebook. Laser printer and photocopier. Post-it notes to remind yourself what you wanted to do next (or e-mail memos to yourself). Maybe I should have listed this first.

    Good luck.

    1. Re:Sounds like High Energy Physics by foog · · Score: 1

      I wish I could mod the above up, great advice, I might not have posted below if I'd seen it.

      I'd emphasize that using a scriptable graphing/postprocessing program (I used to use gnuplot and octave, there are many interesting options more widely documented now) is really key.

      Nothing like starting a script and being able to walk away from it for the afternoon, or the night, or the weekend...

    2. Re:Sounds like High Energy Physics by Alea · · Score: 1

      Heh... you've pretty much described what I'm using now. I'm most interested in what you say about using XML for output data. Are you using any particular DTD/Schema or just cooking up something for each occasion?

      I agree that a "monolithic" solution isn't going to do much for you. That's why I'm thinking more in terms of middleware, something that will help me bind these tools together. I envision that some scripting would be necessary to bind in new tools and experimental software for each new project, but I'd like the rest of the framework to remain the same.

      I think my own conclusions are pretty much that I want an XML format that can encode all the details: the parameters, what datasets to use, what algorithms to try, and also to store the output data.

      Then a small collection of tools (likely written in a scripting language) that will read these XML files and put it in motion. These tools would probably invoke other software to do the actual work (e.g. gnuplot for plotting, AppLeS for distributing jobs, etc...).

      I think these tools could see repeat use (especially if they have a plugin interface so new software and capabilities can be added in) and could be shared with others. I certainly don't have any illusions that I'm not going to have to adapt or that it will all run itself, but why are we all constructing this framework individually and from scratch?

      I don't see the tools you mentioned as the middleware. I see the "glue", that you mentioned only in passing, as the real issue, and that glue could be a standard file format (or formats) and some tools that work with that.

    3. Re:Sounds like High Energy Physics by foog · · Score: 1

      The really killer advantage to good middleware for this kind of thing would be improvements in the user experience and relief from the drudgery of learning yet another arcane sub-language to get the results you want.

      I think you're asking for a very powerful, very well-designed IDE with good integration with configuration management, software instrumentation, etc.

    4. Re:Sounds like High Energy Physics by gowdy · · Score: 1

      I wonder if I should post this here again... oh well, I will. http://roofit.sourceforge.net/ is based on ROOT and adds some of what is wished for.

    5. Re:Sounds like High Energy Physics by Anonymous Coward · · Score: 0

      I have been waiting for a rainy day to learn XML and Schema and DTD and to start using the full power of those tools, but I have not gotten there yet. I have been re-distracted by FORTRAN and PAW.

      I am really just cooking something up for each occasion. Whenever possible I have used single printlines with keywords because it is easier to parse and shove into the next analysis tools.

      But sometimes I needed to look at data that was too long for a single line. I needed to encapsulate things like events or runs that had data with multi-parameter sub-structures but wanted these things to remain human readable and grep-able or perl-able keywords so that I could quickly look for a suspicious or interesting patterns.

      So, I have yet to move to a full blown XML format but have stepped part of the way there, and have only copied part of the syntax, the encapsulation, and the self-description aspects, intending to learn the rest of it soon.

      I think you could approach this from the opposite direction as me. You could insist on true proper XML format, DTD, Schema, and put standard XML parsing tools into your programs AND then still grep keywords for rapid analysis and development and debugging.

      The hang-up I would worry about with true XML is that XML files are not actually intended to be "human understandable" from a data storage point of view. Self-descriptive, yes, but not human readable. But since human-readable is a feature I think is useful and necesary in my work, my hack versions always have this quality.

  33. Re:Welcome by djupedal · · Score: 1

    So, where is the smart in shopping for a solution, while the work piles up? What proof is there someone hasn't already tried that route? I don't see any evidence he's head-down, butt-up in the meantime. It looks to me like he's just whineing for help.

  34. Re:Welcome by djupedal · · Score: 1

    easier to flag as a troll than it is to respond, right? Must be lazy sunday. I must have had a valid point after all.

  35. But... Isn't that YOUR Job? by PetoskeyGuy · · Score: 1

    You want the computer to run the experiment, catalog all the results and present them in a nice format. Maybe when it's done it can put your name on the results and publish it for you too.

    Just Kidding ;o)

    But if your determeined to let the computer do the work, perhaps some form of Genetic Algorithm could be applied here. If you can define you domain into something that can be broken down well enough and tested for selection criteria there are lots of tools and research available. If you have an API to work with like you said it shouldn't be too hard.

    Of course converting it to a GA may take longer then your original experiments to implement.

  36. schema by Tablizer · · Score: 2, Informative

    Draft relational schema:

    Table: experiments
    ----
    exprmntID
    exprmntWhen // date-time stamp
    exprmntDescr // description
    outcome

    Table: params
    ----
    paramID // auto-num
    exprmntRef // foreign key to experiments table
    paramName
    paramValue

    Table: dataSet
    ----
    dataSetID // auto-num
    filePath
    datasetDescr
    isGenerated&nbsp ; // "True" if from experiment
    CRC // ASCII check-sum to make sure not changed

    Table: dataSetUsed
    ----
    exprmntRef // foreign key to experiments table
    dataSetRef // foreign key to dataSet table

    Table: softwareVersion
    ----
    svID
    softwareTitle
    svVers ion

    Table: softwareVersionUsed
    ----
    svRef // foreign key to softwareVersion
    exprmntRef // foreign key to experiments table

    Just use something like MySQL or MS-Access, and perhaps some kind of CRUD[1] tool to create front ends. You can expand from there based on new needs you encounter.

    [1] CRUD = typical Create, Read (list), Update, Delete screens.

    (Note: slashdot's filter scrambles certain variable names.)

  37. you need... by Alpha_Nerd · · Score: 2, Funny

    It looks like you need - da da da da! - [b]EXTREME[/b] PROGRAMMING!

  38. configuration management, build scripts, etc... by foog · · Score: 2, Informative

    The features I would want would be:

    management of all details of an experiment, including parameter sets, datasets, and the resulting data


    This can be handled by an ad-hoc database, a flat file in most cases. If you were a Windows power user, you'd spend an hour or two putting together something in Access for it.

    ability to "execute" experiments and report their status

    make with a little scripting, or whatever you use as a build system.

    an API for obtaining parameter values and writing out results (available to multiple languages)
    additionally (alternately?) a standard format for transferring data (XDF might be good)
    ability to extract selected results from experimental data
    ability to add notes


    Again, an ad-hoc database would be your friend.

    ability to differentiate versions of software

    This is conventionally handled with a configuration management system like CVS, Sourcesafe, or Clearcase.

    I hate reinventing the wheel, too, and I'd love to see a good book on using standard free Unix tools like make, CVS, Postgres, perl or some other common scripting language, TeX, etc for cleanly and efficiently
    automating complex computing processes and producing nice reports from them.

    PAW and ROOT look interesting though they look like overkill for many apps.

    Also, get a copy of Writing The Laboratory Notebook, some hardbound buffered laboratory notebooks, and Sakura 05 Pigma Micron archival pigment pens to keep your paper records. You'll thank me.

  39. Smarter? that's funny by djupedal · · Score: 1

    Working smarter only works if you're actually smarter. While it might be good for a laugh, asking ./ hardly qualifies as smart. So, if smarter isn't an option for you, you only have hard work to fall back on.

  40. hello by Anonymous Coward · · Score: 0

    agree with you. prefer to do everything myself. am entering this using switches with led grid screen on homebrew z80 with tcpip stack i finished yesterday. expect to finish entire system in 8 more years. will be well worth it, sure am glad didn't waste time buying off shelf system!

  41. rubbish by djupedal · · Score: 1

    You know what they say, if you want a job done fast, give it to a lazy man.

    Again, if such a solution existed, it would already be in place. This whiner complained about the amount of work, that clearly comes with the job. He just wants to go home earlier...don't we all. Nothing smart in that.

  42. that's what UNIX is there for by g4dget · · Score: 4, Informative
    Managing and organizing really huge amounts of data is one of the big strengths of UNIX--you just have to learn how to use it well:
    • Consider using "make" or "mk" for automating complex processing steps. "make" also lets you parallelize complex experiments (by figuring out which jobs can be run safely in parallel), and some versions of "make" are capable of dealing with compute clusters. If you need to try something with multiple parameter values, write make rules and put the parameter values in there as dependencies.
    • Organize your data into directory hierarchies; pick meaningful and self-explanatory names. Don't let directories become too big. Keep related data files and results together in the same directory, and keep different data files in different directories.
    • Keep scripts and programs along with the data, not in completely separate source trees.
    • Write scripts that summarize the data and give them obvious names; you can figure out later from that what needs looking at and what it means.
    • Use textual data files as much as possible and have your programs add information to those files as comments that document what they did.
    • If you generated important result, keep a snapshot of the sources that generated it along with it.
    • Leave copious README files everywhere, containing notes to yourself, so that you can figure out what you did.
    • If you generate junk during some trial runs, delete it, or at least rename it to something like "results.junk", otherwise you'll trip over it later.
    • Back things up.
    • Learn the core UNIX command line tools, tools like "sort", "uniq", "awk", "cut", "paste", "find", "xargs", etc.; they are really powerful. You probably also want to learn Perl, but don't get into the habit of trying to do everything in Perl--the traditional UNIX tools are often simpler.
    • If you are using Windows, switch to UNIX. Windows may be good for starting up MS Office, but it is no good for this sort of thing. If you absolutely must use Windows for data analysis, stick your data into a relational database or Excel spreadsheets.
    • Learn to use environment variables.
    • Learn to use the Bourne/Korn/Bash shell; the C-shell is no good for this sort of thing.
    • For certain kinds of automation, expect is also very handy.
    • For visualizing data, write scripts that analyze your data and automatically generate the plots/graphs--you will run them again and again.

    Distribution of jobs, running things with multiple parameter values, etc., all can be handed smoothly from the shell. This is really the sort of thing that UNIX was designed for, and the entire UNIX environment is your "experiment management software".

    1. Re:that's what UNIX is there for by ExoticMandibles · · Score: 2, Interesting
      If you are using Windows, switch to UNIX. Windows may be good for starting up MS Office, but it is no good for this sort of thing. If you absolutely must use Windows for data analysis, stick your data into a relational database or Excel spreadsheets.

      What is it intrinsically about Windows that makes it "no good for this sort of thing"? Windows provides all the system services you need to do these tasks, and all the tools you mention are available natively for Windows. Come to think of it, they're available for OS/2, QNX, Mac OS X, and nearly every other desktop operating system out there. One could erase every mention of UNIX-specificness from your post, and not only would your post still hold true, it would be an improvement. Your knee-jerk UNIX advocacy, nestled in and disguised as helpful advice, is a disservice to the original poster.

      Suggesting that the original poster must be using UNIX in order to get their work done is wrong in several senses of the word; it is not factual, and it is irresponsible. On the contrary--I am certain that their current choice of operating system is entirely up to the task. He or she should feel absolutely no onus to switch.

    2. Re:that's what UNIX is there for by Cyno · · Score: 1

      all the tools you mention are available natively for Windows

      Yes, but are they built-in?

      GNU just works out-of-the-box. Windows, UNIX and any other commercial solution just seems to miss the whole point of having a computer. Processing data. Which is why they forget to include a spreadsheet and a database and a compiler and scripting languages and various other tools that are a requirement for this data processing we all love to do so much.

    3. Re:that's what UNIX is there for by hyperturbopete · · Score: 1

      Word!

      Make is fantastic for organizational purposes. makefiles are basically a language for describing dependencies. If you have a directory-based structure, it works wonderfully. You just have to make sure your stuff is orthogonal :-)

    4. Re:that's what UNIX is there for by g4dget · · Score: 1
      What is it intrinsically about Windows that makes it "no good for this sort of thing"? Windows provides all the system services you need to do these tasks, and all the tools you mention are available natively for Windows.

      So, you are saying that you have never actually run large scale computational experiments, and you don't actually have any recommendations for how to run them on Windows, but based on a list of Windows "system services" you think it should be pretty good for that.

      Well, I have run large scale computational experiments on Windows, UNIX, and other platforms. Based on my experience, my recommendation was and is: if you can, do them on UNIX because it's so much easier. If you have to use Windows, stick to relational databases, because that's how large amounts of data are handled on Windows.

      Now, if you have anything technical to contribute, please do.

    5. Re:that's what UNIX is there for by Paul+Komarek · · Score: 1

      In my experience, GUIs apply fundamental restrictions to data manipulation due to the graphically-dominated input and output systems. There are lots of neat programs like SAS for doing data manipulation through a gui, but every worthwhile tool I've ever used on Windows also offered a CLI. When I've encountered experts with these applications, they've always preferred the CLI Perhaps they prefer the CLI because that's what the learned with, but I expect it is because it works better for them. Matlab, SAS, Mathematica, and IDL are some examples.

      Once you are working with a CLI, things are easier if the other apps and the OS are dominated by CLI mentality, in my opinion. For instance, cutting and pasting between text apps is more difficult in the Windows Presentation Manger (or Explorer, or whatever it is now) than in X. Focus issues in Windows (can these be changed yet?) don't allow separation of focus and stacking order (focused window is always on top). Furthermore, CLI apps generally use stdin and stdout, while GUI apps don't really have the same standarized I/O mechanisms. Anyone who has written GUI apps in Windows knows has asked themselves at least one "so where in heck does fprintf(stdout,"foo") show up?".

      Standard CLI tools on a GNU-ish system (which means most UNIX-ish installations, since GNU text and file tools are nearly standard) include many apps for parseing and rebuilding text files and streams. There are many regex tools available by default, whether CLI, programming languages, or GUIs. You can get a lot of work done with just bash+grep. Furthermore, modern UNIX shells have a mostly-reasonable programming language.

      None of these things are part of Windows as MS configures it. Even the standard libs that come with Windows don't support regex (I guess the new MSDEV .NET has some sort of regex support now). Heck, Windows doesn't even come with programming support of any kind, except for pathetic batch files and, er, is there anything else with default Windows? My point is that MS built Windows with saleability in mind, and makes it attractive to people who are in charge of purchasing software. Contrast this with UNIX, which was created without permission on company time by researchers for themselves. UNIX is meant for research, and that is why an arbitrary precision stack-based RPN CLI calculator (dc) has been part of UNIX distributions since the mid seventies (just guessing).

      These are the reasons that I believe support my opinion. Graphical communication systems are a pain the butt, which is why we ditched heiroglyphics thousands of years ago. Graphical Interfaces may be useful for communicating with Chimpanzees, but they are a lousy way to data manipuulation tasks in detail.

      -Paul Komarek

    6. Re:that's what UNIX is there for by ExoticMandibles · · Score: 1
      I didn't say one way or the other, but no, I have never actually run large scale computational experiments. And, no, I didn't make any recommendations on how to run them on Windows. That's because your original posting did a good job of describing one approach to solving this problem. My point was to correct your error in saying Windows was "no good for this sort of thing". It is entirely capable of doing "this sort of thing".

      What I meant by "system services" was services offered to user programs by an operating system. Things like: files, directories, networking, and job control. ("System services" was the term we used in my operating systems class, lo these many years ago; is there some newer term I should have used instead?) UNIXes provide them, and yes indeed so does Windows, and so do many other desktop operating systems, which makes them perfectly workable tools for this sort of analysis.

      My technical contribution seems to have passed you by, so I will state it once again: everything you mention in your original posting can be done, in the same way and using the same tools you mention, under Windows. Your assertion that a user must switch to UNIX in order to do these things is wrong.

      In your reply dated 4/20 @6:30pm, you say:

      If you have to use Windows, stick to relational databases, because that's how large amounts of data are handled on Windows.
      I don't know how you came to this conclusion. Perhaps it was the convention in environments you have encountered. But do not mistake convention for the limits of capability. Indeed, let me assure you that Windows operating systems are more than capable of storing large amounts of data as files in a directory hierarchy, with meaningful filenames and in plain text, and processed with shell text processing tools. Just as you suggest to the intrepid original poster.

      If it would help, I could reply with a point-by-point analysis of your original posting saying "yep, you can do that under Windows", "yep, you can do that under Windows too, and here's how", but I felt it would be redundant. Would you like to see it anyway?

    7. Re:that's what UNIX is there for by ExoticMandibles · · Score: 1
      As for working out-of-the-box, I will go ahead and answer your rhetorical question: no. Windows comes with nearly none of those tools installed. But you're changing the subject. We were not discussing convenience, we were discussing ability. They said (or certainly strongly implied) that Windows was not capable of managing data in the method he suggested. I completely disagree.

      Yes, it's inconvenient for a Windows user to download and install the extra applications (a better shell, all the command-line tools you'd want, perhaps Python). But this inconvenience is a trifle compared to switching operating systems, as g4dget had suggested would be necessary before Windows users could employ his data-management methodology.

      p.s. I feel it's up to the user to decide for themselves what the point is of having a computer. My nephew doesn't use his computer for processing data, unless you feel "Red Alert 2" is an elaborate GUI for data processing. One could argue that it is. But I haven't seen it pre-installed on a computer yet.

      p.p.s. I wasn't aware of "GNU" being a name or nickname for a particular operating system. Were you referring to "GNU/Linux", "GNU Hurd", or something else?

    8. Re:that's what UNIX is there for by ExoticMandibles · · Score: 1
      For what it's worth:
      • Windows never had anything called "Presentation Manager"--that was the GUI shell for OS/2. Windows Explorer is the default GUI file/directory navigation tool, equivalent to GMC or Nautilus under GNOME, or Konqueror under KDE. The Windows GUI is simply referred to as USER, after the implementing library (originally user.dll, these days user32.dll).

      • Focus is the same in Windows as it ever was. There is the Microsoft Power Tool "Xmouse", which makes the mouse behave in a vaguely X-like manner. But it confuses some too-clever-for-their-own-good applications. Some UNIX-y friends who were new to UNIX initially turned on Xmouse, and later switched it off, as it caused more problems than it solved. It doesn't appear to have been ported to XP, and I doubt the older versions still work (not that they ever worked that well). As for personal taste, I prefer the Windows method for focus control, but then I'm used to it.

      • Sadly, text copy and paste between "console" (glass TTY) applications in Windows is still inconvenient. The default is bad, and "Quick Edit" is worse. There is third-party software to correct this deficiency (for instance, "Take Command" by JP Software), but it is difficult to use it with some interactive glass TTY apps as they scan the keyboard directly instead of reading from stdin. ("Take Command" addresses this with what it calls "Caveman mode", but years ago when I tried it I found it both unreliable and too CPU-expensive. Thankfully such apps are more and more rare.)

      • I haven't played with the .NET environment, but outside of that regex system libraries are still not standard equipment with Windows. That's an aftermarket part.
      I'm not going to wade into the whole GUI / CLI debate, as it's going a bit far afield from my original point.

      One final bit that might amuse you. Dave Cutler, the head of the original NT development project (and I assume still the head of its ongoing development), wanted to ship NT as a glass-TTY OS. The GUI would be a distinct application, as X/GNOME/KDE is distinct from Linux/Solaris/etc. Management overrode him, and NT shipped always using a GUI.

    9. Re:that's what UNIX is there for by Paul+Komarek · · Score: 1

      Are you sure that Windows never had a Presentation Manager? I'm glad you knew what I was getting at. But I thought Windows (maybe 3.0, 3.1, or 3.11) had a pm process running somewhere.

      I played with Xmouse when I was using Windows, and I agree with your assesment. My wife has messed with various virtual-screen switchers in Windows, and all of them seem to have problems with some "too clever" apps, just like Xmouse. I've seen a lot of folks at my university play with various X "servers" (well, mostly Hummingbird) for Windows, but that doesn't seem worth the effort in the long run given the troubles they've had.

      I was once told that a productive user accepts the default settings. Messing with Windows to make it like UNIX, and vice-versa, seems a silly waste of time. Especially since you can have GNU/Linux for free on a virtually zero-cost PC.

      -Paul Komarek

    10. Re:that's what UNIX is there for by Cyno · · Score: 1

      I agree with you, but I'm still playing devils advocate here :)

      Windows standard edition alone can not do what you're talking about. You have to have their professional edition with SQL, Excel, etc. This is several hundreds of dollars in additional software on top of an expensive "professional" version so you have a chance at getting stability. You wouldn't dare attempt to manage important data on a home version of a Microsoft OS. When your standard RedHat 9 download includes EVERYTHING you need and is often more stable than the professional versions of Windows.

      That's the difference in quality and capability I think we are attempting to argue. Not that the NT kernel is inferior or incapable of performing the task, just that all the tools and the right kernel are not included by default. When you buy a professional version of Linux you get a kernel designed to scale up to at least 16p, not additional stability.

      GNU systems are designed for data processing. What you are talking about is playing games, which is exactly what the average home version of Windows was designed for. So I don't understand your comments. Aren't we talking about data processing?

      GNU refers to all GPLed software. More than enough to build a pure GNU OS, I believe.

    11. Re:that's what UNIX is there for by ExoticMandibles · · Score: 1
      I bet you're thinking of "Program Manager", the Windows 3.x (and before) program launcher application... precursor to the Start Menu.

      As for messing with Windows to make it like UNIX, that depends on what you mean. Trying to make the GUI behave more like X is a fruitless endeavor. But I use zsh and Python every day. Windows' glass-TTY sucks for cut and paste, but it works just fine for everything else.

  43. I've been working on this by jmichelz · · Score: 1

    I've been working on this, though it's not yet released as open source. If you'd like to try my system out it might speed up the release date. It's a web application written in python with a postgresql backend. Give me a ring at jmichelz at mail dot com for more details.

    John

  44. ossa ools! by jefu · · Score: 1
    I do the same kind of thing with computational experiments and I'd like to agree with those who've suggested using a variety of tools.

    Its not strange, for example, for me to use python to generate the actual program runs, the shell to actually manage the run and move the input/output data files, then any of several graphics programs to handle the output (and often output graphs are done automatically as the programs run).

    This gives me a pile of flexibility which is often useful. For instance, when doing stuff that might run for a while, I'll often sample things at widely spaced data points, then fill in the gaps. For things that are less long running, I'll just chomp through them all in order. It also allows me to rename input/output data files as things work, compress them when needed and so on.

    Its also nice sometimes to be able to set things up to recompile the source with different numbers for efficiency instead of feeding the numbers to the executable on the command line or on stdin.

    XML and XSLT are also becoming increasingly useful in describing input data, recording results and keeping track of things done.

    1. Re:ossa ools! by foog · · Score: 1

      XML and XSLT are also becoming increasingly useful in describing input data, recording results and keeping track of things done.

      Those of you that are finding XML useful for this sort of thing, what tools and ideas are you using?

    2. Re:ossa ools! by jefu · · Score: 1
      I'm increasingly using xml to save information for a specific run of the program. Usually this is ad-hoc and the markup tends to be simple enough so that the program itself, or the driver will handle building it.

      Then I use xslt (again, ad hoc) to digest this and produce output (often in html these days) that I can look at.

      I could do it all in other ways - the advantage of xml is that I can describe the markup somewhere so later on I'll not forget what the data actually was, and that I can use XSLT with some extra stuff to build html pretty directly.

      So the answer is, no tools in particular. The ideas are only those of trying to keep my poor befuddled brain from being overwhelmed by the output from this stuff.

  45. Condor by Anonymous Coward · · Score: 0
    Condor is a robust open-source tool for distributed process management and intelligent control of clusters.



    Regarding the more general request for software that manages data, beats me. I do computer science research and I have asked myself many times if such software exists.



    What suitable proprietary solutions did you find? I could not find any software (open or closed) that would properly manage bulk data.

  46. Re:Clear TOS by Alien+Being · · Score: 1

    I agree that defining the schema is a good place to start, and that a db backend is the "right thing" to do. For this app though, Postgresql offers some attractive features such as inheritance, stored procedures, and an eclectic set of datatypes.

  47. Sharing is your best option by fymidos · · Score: 1

    1. ...
    2. ...
    3. ...

    conclusion: share your software, start a new project , see if other people are willing to help out.

    --
    Washington bullets will simply be known as the "Bulle
  48. SMIRP by sco08y · · Score: 2, Informative

    I'm one of the principal designers of a system called SMIRP.

    It started out as a very simple system that didn't act as much more than a set of tables with some simple linking structures. On top of that is an alerting system, (so you can track new experiments being done) a full text index, bots for automating certain procedures, and a system for transferring data to Excel.

    What's surprising is that for the most part, the underlying structure stayed exactly the same even though we've been running all the operations in an inorganic chemistry lab on for, oh, four years now. I've been chewing over ways of rewriting it because, honestly, it's still the same prototype. I'd love to go with an all Perl solution... but the damned thing just works and I have other stuff to do.

    Some lessons I've learned, problems I've run into:

    A general interface. You really need a flexible structure because scientists never know what parameters they're going to use until they do the experiment. Our big success has been such a simple structure that people can throw a SMIRPSpace together in minutes.

    Browser based interface. It's great because it's ubiquitous, but it's painful because of the inflexibility of forms. One big win with it is that you can get a horde of workstudies to form a pipeline. For example, a grad student might put a request in the system for an article, a workstudy recieves a notification of the change and hits the web to fill in details, another then gets notified and sends a request to the library, another gets notified and scans the result and finally the grad student sees a scanned copy of the article.

    Excel based interface. It's great because people can play with data, but it's Excel...

    XML is garbage. There's nothing you can do in XML that you can't do better with a flat file + regexes, or a SQL DBMS. XML is utterly, completely worthless.

    Proprietary products. This won't be a huge surprise to /.'ers, but we got seriously screwed when the prototype we did in Cold Fusion became production code and we realised that Allaire (and later Macromedia) would not computer redistribution for less than 10,000 units. I could try to get it running on another CF implementation (I think there's some Blue Dragon or something) but honestly, I'd rather rewrite the whole thing.

    Reporting. This is *hard* to do. We still don't have any serious system for handling reports beyond "import the data to Excel and do it manually."

  49. there are many projects developing such software! by edeljoe · · Score: 4, Informative

    Funding agencies in the USA (NSF, NIH) and Europe have recently decided to target the construction of such software, and many competing projects have been given grants, most of which involve the production of open source software.

    Relevant keywords are "eScience", "Experimental Data Management", "Experimental Metadata", and to some extent "Grid Computing".

    Here is a paper which lays out the program of research.

    I work for one such NSF & NIH funded project at Dartmouth College. We're developing such a tool : Java-based, completely open, available at sourceforge, currently in alpha, to be released for fMRI use in July, but designed from the start to be generalizable for all of experimental science. This is built on top of a pre-existing framework for semantic data management and modeling from Stanford.

    I'll try to list some of the features relevant to your needs:

    • the thing will organize all your data across all experiments and sports a nice Java API, annotations, a set of interchangable & sophisticated query engines, and java plugins for supporting, among other things, application specific tasks, application specific rendering widgets for data, and new backend data formats.
    • currently supported backend formats include: RDF, DAML+OIL, XML, text files, and SQL databases.
    • we should have cluster job submission support integrated in by july, but it depends on your cluster set-up. currently this is presented to the user by way of executing "processing pipelines" for data. If this metaphor doesn't work for you, you may have to write some additional code for us!
    • since the experimental designs are represented in a prolog-style knowledge-base, it would be very simple to put some intelligence in about how to "run" or "execute" a given class of experimental designs and do a lot of automatic reasoning or planning re: dependencies. In fact, I think that someone at Stanford has already done this, but I'd have to look into it.

    Finally, I would like to stress that our project is one of many, and that if it doesn't meet your needs, within a year there will be many competing "eScience" toolkits.

    You may contact me for more information by reversing the following string: "ude.htuomtrad@exj".

  50. I Develop This Kind of Software by spirality · · Score: 3, Informative


    The Computer Aided Engineering (CAE) world has much the same problem you do.

    They model their products with several different analysis codes, each with its own input and output format. This generates a gob of data, and is currently managed in ad hoc ways, is not easy to integrate with other results and wastes the time of lots of engineers.

    The product we've come up with to manage both the models, the process for executing the models, and the data generated by running the models is a software framework called CoMeT (Computational Modeling Toolkit).

    We are also capable of managing different versions of the model, parameter studies, and some basic data mining. The whole thing is scriptable with Scheme.

    Unfortunately, we are a commercial software company, and the software is still under development, although everything I mentioned above can currently be done. We are mostly working on a front end now, although we still need to make a few improvements to the framework and add support for many analysis codes.

    The reason I'm replying to this is that your list of requirements is a perfect subset of ours. We are aiming our product at CAE in the mechanical and electrical domains (Mechatronics).

    I know, it's not free, but we feel we've done some very innovative things and it has taken several people many years of low pay to get this far. We really want to make some money off it eventually.... :)

    If you want more information check out the web-site or email me here. We're in need of proving this technology in a production environment so maybe we can work something out.

    -Craig.

  51. Might be suitable? by gowdy · · Score: 2, Informative

    http://roofit.sourceforge.net/

  52. less tools by Anonymous Coward · · Score: 1

    more experienced programmers.

    I think someone told all the computer scientists that there's a theoretical way to write a program that does everything. I'm a computer scientists and it's clearly impossible to generalize that greatly.

    In fact it is so dangerous, the general purpose OS is the root cause of our software is the number one cause of our downward spiral in software quality. It took BeOS a short time to write their general purpose OS. I think it's silly to think it's that monolithic of a project that it must be maintained over several decades of development. We should be developing a fresh OS for individual applications more as a matter of principle than what might be called arrogance today to go against one of the current bastions of computing like Linux or MS Windows.

  53. Re:Clear TOS by Tablizer · · Score: 1

    Postgresql offers some attractive features such as inheritance, stored procedures, and an eclectic set of datatypes.

    I never was much of a fan of schema inheritance. Most examples I have seen were based on bad designs IMO. And funky datatypes decreases porting and sharing of the data to other DB's.

    Plus, I think they wanted something "lite" in the DB department based on one comment, and Postgre has a bit more of a learning curve.

  54. ExpLab by The+Visiting+Priest · · Score: 2, Informative

    I'm in precisely the same situation as Alea, so I read the suggestions here with considerable interest.

    I'd like to mention ExpLab.

    Though I haven't used ExpLab yet, these folks have been associated with other very high quality work (CGAL) so I expect good things. Here are three goals they list for the project:

    • to provide a simple way to set up and run computational experiments;
    • to provide a means of automatically documenting the environment in which an experiment is run so the experiment can be easily rerun (provided the same environment is still available) and the results can be more accurately compared to the results of other computational experiments;
    • to eliminate some of the tedium involved in collecting and analyzing output by providing basic text output processing tools.
  55. SpecTCL by luzrek · · Score: 1
    SpecTCL was developed for use at the National Superconducting Cyclotron Laboratory. It is apparently avalible through sourceforge. While I'm not intimately familiar with it it (I wrote my own code from scratch, but it is hardly general purpose), SpecTCL allows objects (1D & 2D histograms, gates, etc) and procedures to be defined using the TCL scripting languate or C++ while providing a consistant display.

    The high energy folks also have a similar set of packages (as other nuclear labs probably do).

    --

    Galium Arsenide is the material of the future, and always will be.

  56. workflow by RalfM · · Score: 1
    You seem to be talking about a process automation tool, aka workflow management. If you're doing research you might have a look at DSTC's Breeze, free for non-commercial/academic use.

    Ralf

    --
    The trouble with the world is that the stupid are cocksure and the intelligent are full of doubt.
    -Bertrand Russel
  57. Experiment Management Software by rtroy · · Score: 1

    Stop! Don't re-invent that wheel! The work - has already been done... The system you desire was developed at U C Berkeley nearly ten years ago and has been in production at places like JPL and the Langley Research Center since 1997. This software contains all the features you desire and a lot more, including a fully distributed processing system, a collaborative distribution system, archive hooks and even has robust security features. And, conveniently, it doesn't contain any of the features you said you don't want! See http://ScienceTools.com/ (or send me an email) for more information. Regards, Richard

    --
    Richard Troy, Chief Scientist Science Tools Corporation rtroy@ScienceTools.com, 510-567-9957, http://ScienceTools.com/