Slashdot Mirror


Computer Science Tools Flood Astronomers With Data

purkinje writes "Astronomy is getting a major data-gathering boost from computer science, as new tools like real-time telescopic observations and digital sky surveys provide astronomers with an unprecedented amount of information — the Large Synoptic Survey Telescope, for instance, generates 30 terabytes of data each night. Using informatics and other data-crunching approaches, astronomers — with the help of computer science — may be able to get at some of the biggest, as-yet-unanswerable cosmological questions."

8 of 60 comments (clear)

  1. too much? by danbuter · · Score: 2

    My biggest issue would be if there is too much information. What if the scientists are using the wrong search queries and missing something important? Or maybe something important is just buried on page 931 of a 2,000 page data report. Still, it's better than the opposite problem, of just not having the data to search.

    1. Re:too much? by SoCalChris · · Score: 4, Insightful

      There's no such thing as too much data in a case like this, assuming that they can store it all. Even if it's too much to parse now, it won't be in a few years. Get as much data as we can now, while there's funding for it.

    2. Re:too much? by DigiShaman · · Score: 2

      Disk I/O and the ability to backup that data can be a bitch. Especially if the delta changes overlap within a 24-hour period. Of course, there are ways of addressing this problem with multiple servers, but that comes at a financial cost. Also, SAN and DAS technology still lags behind in I/O compared to the explosive growth in storage capacity.

      Personally, I have clients that deal with 30+ TB worth of science data. Data retention is a major headache for me because as of four years ago, they only needed 2TB of storage. I can't keep up with their needs without the EqualLogic or similar enterprise solution route.

      Google. Please throw us a bone here. We could use a software solution that's both manageable, non-proprietary, and will scale with off-the-shelf hardware. Ya, I know. I'm asking for a lot here. :(

      --
      Life is not for the lazy.
    3. Re:too much? by DerekLyons · · Score: 2

      What if the scientists are using the wrong search queries and missing something important? Or maybe something important is just buried on page 931 of a 2,000 page data report?

      Which is pretty much the same problem astronomy has had since roughly forever... Looking in the wrong place. Looking at the wrong time. Looking in the wrong wavelength. Look for the wrong search terms. Looking on the wrong page... It's all pretty much the same.
       
      The sky and the data will be there tomorrow and they'll try again. Just like they always have.

  2. True in all fields by eparker05 · · Score: 3, Interesting

    Many sciences are experiencing this trend. A branch of biochemistry known as metabolomics is a growing field right now (in which I happen to be participating). Using tools like liquid chromatography coupled to mass spectrometry we can get hundreds of megabytes of data per hour. Even worse is the fact that a large percentage of that data is explicitly relevant to a metabolomic profile. The only practical way of analyzing all of this information is through computational analysis, either through statistical techniques used to condense and compare the data, or though searches on painstakingly generated metabolomic libraries.

    That is just my corner of the world, but I imagine that many of the low hanging fruits of scientific endeavor have already been picked, going forward, I believe that the largest innovations will come from the people willing to tackle data sets that a generation ago would be seen as insurmountable.

  3. Re:If you'd like to help with all that data... by NoNonAlphaCharsHere · · Score: 3, Interesting

    Annnnd... we have a winner. GalaxyZoo uses tens of thousands of underutilized, superfluous, non-specialized 'carbon units' for pattern recognition, which they're really really really good at, that is, 800mS after looking at an image -> elliptical, spiral, irregular... "Hmmm, hey, that's funny... wait... WTF --- let's post this to the forum, where hundreds of other random carbon units will weigh it, and a For Really Astronomer(TM) will be checking it out inside 24 hours if it creates enough buzz..." see Hanny's Voorwerp for the quintessential example.

    Software that could 'be surprised' would be nice, but it's a long, long way off.

  4. Generates? Wrong tense. by oneiros27 · · Score: 5, Informative

    *WILL* generate. LSST isn't operating yet.

    And yes, 30TB is a lot of data now, but we have some time before they finally have first light.

    Operations isn't supposed to start 'til 2019 : http://www.lsst.org/lsst/science/timeline

    We just need network and disk drive sizes to keep doubling at the rate they have, and we'll be laughing about how we thought 30TB/night was going to be a problem.

    SDO finally launched last year with a date rate of over 1TB/day ... and all through planning, people were complaining about the data rates ... it's a lot, but it's not insurmountable as it might've been 8 years ago, when we were looking at 80 to 120GB disks.

    Although, it'd be nice if monitor resolutions had kept growing ... if anything, they've gotten worse the last couple of years.

    (Disclaimer : I work in science informatics; I've run into Kirk Bourne at a lot of meetings, and we used to work in the same building, but we we deal with different science disciplines)

    --
    Build it, and they will come^Hplain.
  5. Re:Generates? Wrong tense. by Carnivore · · Score: 4, Informative

    In fact, they just started blasting the site. I actually live next door to the LSST's architect, which is pretty cool.

    Astronomers generate a tremendous amount of data, bested only by particle physicists. Storing it all is a challenge, to put it mildly. Backup is basically impossible.
    The real problem is that the data lines that go from the summit to the outside world are still not fast. The summits here are pretty remote and even when you get to a major road, it's still in farm country. And then getting it out of the country is tough--all of our network traffic to North America hits a major bottleneck in Panama, so if you're trying to mirror the database or access the one in Chile, it can be frustratingly slow.