Scientific Data Disappears At Alarming Rate, 80% Lost In Two Decades
cold fjord writes "UPI reports, 'Eighty percent of scientific data are lost within two decades, disappearing into old email addresses and obsolete storage devices, a Canadian study (abstract, article paywalled) indicated. The finding comes from a study tracking the accessibility of scientific data over time, conducted at the University of British Columbia. Researchers attempted to collect original research data from a random set of 516 studies published between 1991 and 2011. While all data sets were available two years after publication, the odds of obtaining the underlying data dropped by 17 per cent per year after that, they reported. "Publicly funded science generates an extraordinary amount of data each year," UBC visiting scholar Tim Vines said. "Much of these data are unique to a time and place, and is thus irreplaceable, and many other data sets are expensive to regenerate.' — More at The Vancouver Sun and Smithsonian."
Whichever side of the "data is" vs. "data are" argument one falls on, I hope we can all agree that mixing both forms within the same sentence is definitely wrong.
I designed and built the equipment for scientific experiments that will never be repeated: cochlear implant stimulation of one ear, done in an MRI. This was safe because the older implant technology had a jack that stuck out of the subject's head, and which we could connect to electronics outside the MRI itself. But the old "Ineraid" implants have been replaced, clinically, with implants using embedded electronics and usually magnets. Those are hideously unsafe to to even bring in the same *room* as an MRI, much less actually scan the brain of a person wearing one.
So that experiment is unlikely to ever be repeated. Losing the data, and losing the extensive clinical records of those subjects, would be an immense loss to science. There is especially historical data from decades of testing on these subjects that show the long term effects of their implants, or of different types of redesigned external stimulators. That data is scientifically priceless. When I started that work, we used mag-tape for data, and scientific notebooks for recording measurements. I helped reformat and transfer that data to increasingly modern storage devices several times. We went through 3 different types of storage media in 10 years, and I remember having to write software to allow Exabyte drives to find the end of the tape and add data. (Exabytes had no End-Of-Tape marker.) Preserving that data.... was a lot of work.
No but it is amazing what NEW science you can do with OLD data. I've worked with the Transportable Array project for example http://www.usarray.org/researchers/obs/transportable it's over a decade old and scientists are still discovering new ways to take advantage of the data and will likely be doing so for decades to come. On the other hand a lot of data is just junk due to poor quality metadata; when was that instrument calibrated? I dunno. Damn. At leat in geophysics we have the National Geophysical Data Center to curate this stuff http://www.ngdc.noaa.gov/ at least until Congress cuts it's funding.
-73, de n1ywb
www.n1ywb.com