The Astronomical Event Search Engine

← Back to Stories (view on slashdot.org)

The Astronomical Event Search Engine

Posted by kdawson on Tuesday January 9, 2007 @04:05PM from the cataloging-a-firehose dept.

eldavojohn writes "Google has signed on with the Large Synoptic Survey Telescope project that will construct a powerful telescope in Chile by 2013. Google's part will be to 'develop a search engine that can process, organize, and analyze the voluminous amounts of data coming from the instrument's data streams in real time. The engine will create "movie-like windows" for scientists to view significant space events.' Google's been successful on turning its search technology on several different media and realms. Will they be successful with helping scientists tag and catalog events in our universe?" The telescope will generate 30 TB of data a night, for 10 years, from a 3-gigapixel CCD array.

4 of 93 comments (clear)

Re:30 TB a night... by Capt'n+Hector · 2007-01-09 17:26 · Score: 4, Informative

You can't compress this stuff unless you do it losslessly. Compression artifacts mess up photometry - if you're trying to compute apparent brightness, you need to factor in things like how bright the ambient sky is, and how much point sources get spread out (FWHM, seeing). That is, a point source that passes through the atmosphere looks like a normal probabliity distribution because of atmospheric distortions. So to get an apparent brightness, you have to correct for this effect. If compression artifacts are introduced, FWHM is thrown off, and you have no idea how "crisp" your image really is. That's why these data sets are so large. Quite literally, they're doing a pixel dump from their massive ccd all night. But hey, somehow I doubt they'll be using this telescope for anything but object detection. There's no reason to store it all except to compare a current picture to one in a base set, kinda like KAIT on stearoids.

--
Quid festinatio swallonis est aetherfuga inonusti?
Africus aut Europaeus?
Lots of data, but not as much as the LHC by Phat_Tony · 2007-01-09 17:34 · Score: 4, Informative

That's a lot of data, but it's less than 1/10 as much data as the Large Hadron Collider will put out, and the LHC is supposed to be coming online within a year, not in six years. By the time the Large Synoptic Survey Telescope comes online, the LHC may have produced more data than the Large Synoptic Survey Telescope will over the life of the project.

I'd be interested to know more about the data handling methods they have in place for the LHC. I don't think they'll be using Excel.

*Note the correct, non-Frudian-Slip spelling of "hadron"

--
Can anyone tell me how to set my sig on Slashdot?
1. Re:Lots of data, but not as much as the LHC by dido · 2007-01-09 19:14 · Score: 4, Funny
  
  Funny, but CERN itself makes that same misspelling of 'hadron' here. "This is the underground tunnel of the Large Hardon (sic) Collider (LHC)..."
  
  --
  Qu'on me donne six lignes écrites de la main du plus honnête homme, j'y trouverai de quoi le faire pendre.
2. Re:Lots of data, but not as much as the LHC by mcelrath · 2007-01-09 19:30 · Score: 4, Interesting
  
  The LHC will produce more data, but we also don't care about most of it. The vast majority of it is junk. The "interesting" physics (particles like W and Z bosons, top quarks, higgs, etc) are about 10^-9 of the events. It is a huge needle in a haystack problem and we throw out most data. We have many experts and professors who design "triggers" which, based on a subset of information that can be delivered to them in a reasonable time, decide whether a given proton-proton collison contains new physics. Many theorists these days are making dents in walls with their heads trying to think of ways these triggers might be missing important information, so that we can suggest changes before it's too late. This is a lot of dedicated silicon, FPGA's, VME crates, etc. Slashdotters should drool. Anyway, we throw out the vast majority of information.
  By comparison, LSST is trying to store everything. Scroll up for an interesting comment about calibrating ambient brightness and seeing. I can't answer which will deliver more information, but both are incredibly interesting challenges.
  Data challenges abound. We have designed the LHC Grid to distribute this information. There will be several data warehouses located around the world at national labs and universities. Even after the triggers decide what is "interesting", more sophisticated algorithms, with access to all the data in a single proton-proton collision are applied. Then, humans are applied to the data and we will try to dig out new signals from this.
  In all this we expect to find (among other things) the origin of mass and Dark Matter, and we're working hard to prepare for the onslaught of data. :)
  -- Bob
  
  --
  1^2=1; (-1)^2=1; 1^2=(-1)^2; 1=-1; 1=0.