How Do You Organize Your Experimental Data?

← Back to Stories (view on slashdot.org)

How Do You Organize Your Experimental Data?

Posted by timothy on Sunday August 15, 2010 @04:10AM from the can't-remember-where-I-put-my-memory dept.

digitalderbs writes "As a researcher in the physical sciences, I have generated thousands of experimental datasets that need to be sorted and organized — a problem which many of you have had to deal with as well, no doubt. I've sorted my data with an elaborate system of directories and symbolic links to directories that sort by sample, pH, experimental type, and other qualifiers, but I've found that through the years, I've needed to move, rename, and reorganize these directories and links, which have left me with thousands of dangling links and a heterogeneous naming scheme. What have you done to organize, tag and add metadata to your data, and how have you dealt with redirecting thousands of symbolic links at a time?"

3 of 235 comments (clear)

Min score:

Reason:

Sort:

Separate data from presentation by mangu · 2010-08-15 04:16 · Score: 4, Informative

In my experience, the best thing is to let the structure stand as it was the first time you stored the data.
Later, when you discover more and more relationships around that data, you may create, change, and destroy those symbolic links as you wish.
I usually refrain from moving the data itself. Raw data should stand untouched, or you may delete it by mistake. Organize the links, not the data.
Re:Interns. by spacefight · 2010-08-15 04:37 · Score: 3, Informative

Yeah right, let the interns do the job. Not. Interns use new tools no one understands, then finish the project during their term, then move on and let the most probably buggy or unfinished project behind. Pitty for the person who has to cleanup the mess. Better do the job on your own, know the tools or hire someone permanently for the whole deptartment.
Relational Databases won't do! by gmueckl · 2010-08-15 04:49 · Score: 3, Informative

To everybody here suggesting relational databases: you are on the wrong track here, I'm afraid to tell you. Relational databases handle large sets of completely homogenious data well if you can be bothered to write software for all the I/O around them. This is where it all falls apart:
1. Many lab experiments don't give you the exact same data every time. You often don't do the same experiment over and over. You vary it and the data handling tools have to be flexible enough to cope with that. Relational databases aren't the answer to that.
2. Storing and fetching data through database interfaces is vastly more difficult than just using the standard input/ouput or plain text files. I've written hundreds of programs for processing experimental data and I can tell you: nothing beats plain old ASCII text files! The biggest boon is that they are compatible to almost any scientific software you can get your hands on. Your custom database likely is not. Or how would you load the contents your database table into gnuplot, Xmgrace or Origin, just to name a few tools that come to my mind right now?
I wish I had a good answer to the problem. At times I wished for one myself, but I fear the best reply might still be "shut up and cope with it".

--
http://www.moonlight3d.eu/