Slashdot Mirror


Scientific Data Disappears At Alarming Rate, 80% Lost In Two Decades

cold fjord writes "UPI reports, 'Eighty percent of scientific data are lost within two decades, disappearing into old email addresses and obsolete storage devices, a Canadian study (abstract, article paywalled) indicated. The finding comes from a study tracking the accessibility of scientific data over time, conducted at the University of British Columbia. Researchers attempted to collect original research data from a random set of 516 studies published between 1991 and 2011. While all data sets were available two years after publication, the odds of obtaining the underlying data dropped by 17 per cent per year after that, they reported. "Publicly funded science generates an extraordinary amount of data each year," UBC visiting scholar Tim Vines said. "Much of these data are unique to a time and place, and is thus irreplaceable, and many other data sets are expensive to regenerate.' — More at The Vancouver Sun and Smithsonian."

8 of 189 comments (clear)

  1. And in 20 years... by Anonymous Coward · · Score: 5, Insightful

    And in 20 years, these results too shall be lost.

    1. Re:And in 20 years... by queazocotal · · Score: 5, Insightful

      That's not the point.
      The actual published results - even if published in an obscure journal tend to stick around _much_ more.

      Even old journals which go out of publication get their archives and the rights to distribute them bought - as there is some small amount of value there, in addition to the copies in the various reference libraries around the world.

      The problem is that if you are wondering about that graph on page 14 of the paper that the whole paper rests on, you can't get the original data to recreate that graph.

      This is a major problem because the only way to check that graph is now to redo the whole experiment.

  2. Concerning... by AdamColley · · Score: 5, Insightful

    Trying to ignore that a paper about the unavailability of scientific data is locked behind a paywall.

    This is nothing new though, I do occasional conversion from ancient data formats, people need to pay better attention, imagine trying to read an 8" CP/M floppy today.

    As libraries move to digital storage rather than the dead tree that's been fine for thousands of years they are inviting a catastrophe, possibly only one well aimed solar mass ejection from massive data loss.

    1. Re:Concerning... by Eunuchswear · · Score: 5, Insightful

      Digital data can be easily copied and archived

      Can be. But mostly isn't.

      --
      Watch this Heartland Institute video
    2. Re:Concerning... by Anonymous Coward · · Score: 5, Interesting

      I designed and built the equipment for scientific experiments that will never be repeated: cochlear implant stimulation of one ear, done in an MRI. This was safe because the older implant technology had a jack that stuck out of the subject's head, and which we could connect to electronics outside the MRI itself. But the old "Ineraid" implants have been replaced, clinically, with implants using embedded electronics and usually magnets. Those are hideously unsafe to to even bring in the same *room* as an MRI, much less actually scan the brain of a person wearing one.

      So that experiment is unlikely to ever be repeated. Losing the data, and losing the extensive clinical records of those subjects, would be an immense loss to science. There is especially historical data from decades of testing on these subjects that show the long term effects of their implants, or of different types of redesigned external stimulators. That data is scientifically priceless. When I started that work, we used mag-tape for data, and scientific notebooks for recording measurements. I helped reformat and transfer that data to increasingly modern storage devices several times. We went through 3 different types of storage media in 10 years, and I remember having to write software to allow Exabyte drives to find the end of the tape and add data. (Exabytes had no End-Of-Tape marker.) Preserving that data.... was a lot of work.

    3. Re:Concerning... by Lisias · · Score: 5, Insightful

      Wishful thinking.

      Let's make a deal: *first*, the gene therapy works. *THEN* we assume we can afford to lose the data the grandparent talks about.

      --
      Lisias@Earth.SolarSystem.OrionArm.MilkyWay.Local.Virgo.Universe.org
  3. is/are by LMariachi · · Score: 5, Interesting

    Much of these data are unique to a time and place, and is thus irreplaceable, and many other data sets are expensive to regenerate.

    Whichever side of the "data is" vs. "data are" argument one falls on, I hope we can all agree that mixing both forms within the same sentence is definitely wrong.

  4. Re:Why must you have their data? by n1ywb · · Score: 5, Interesting

    No but it is amazing what NEW science you can do with OLD data. I've worked with the Transportable Array project for example http://www.usarray.org/researchers/obs/transportable it's over a decade old and scientists are still discovering new ways to take advantage of the data and will likely be doing so for decades to come. On the other hand a lot of data is just junk due to poor quality metadata; when was that instrument calibrated? I dunno. Damn. At leat in geophysics we have the National Geophysical Data Center to curate this stuff http://www.ngdc.noaa.gov/ at least until Congress cuts it's funding.

    --
    -73, de n1ywb
    www.n1ywb.com