The Astronomical Event Search Engine

← Back to Stories (view on slashdot.org)

The Astronomical Event Search Engine

Posted by kdawson on Tuesday January 9, 2007 @04:05PM from the cataloging-a-firehose dept.

eldavojohn writes "Google has signed on with the Large Synoptic Survey Telescope project that will construct a powerful telescope in Chile by 2013. Google's part will be to 'develop a search engine that can process, organize, and analyze the voluminous amounts of data coming from the instrument's data streams in real time. The engine will create "movie-like windows" for scientists to view significant space events.' Google's been successful on turning its search technology on several different media and realms. Will they be successful with helping scientists tag and catalog events in our universe?" The telescope will generate 30 TB of data a night, for 10 years, from a 3-gigapixel CCD array.

10 of 93 comments (clear)

Min score:

Reason:

Sort:

3/4 LoC a night by Oddscurity · 2007-01-09 16:09 · Score: 2, Interesting

Will they be successful with helping scientists tag and catalog events in our universe? Will they defeat the monster and get the girl? And will they be home in time for tea? Find out next on GoogleTrek.

Seriously though, processing something the equivalent of 3/4th's of the LoC every night is nothing to be sneezed at. Over the course of those 10 years that's about 110 Petabyte (40TB * 365.25 * 10) of unprocessed data.

--
Indeed!
1. Re:3/4 LoC a night by Wavicle · 2007-01-09 16:23 · Score: 3, Interesting
  
  I actually did a small, insignificant portion of LSST's computation feasability study at LLNL during my internship there a couple summers ago. And yeah, the computational requirements were nothing to sneeze at. I'm not sure where they are at now, the specs changed seemingly every month, but when I left the CCD array was up to 3 gigapixels of 16 bit greyscale. I believe the observing cadence (at that time, again everything was changing on a regular basis) was two of those for the same piece of sky every 30 seconds. Wish I could have stayed... ahh well. I did get a really nice full-color research poster (that I had to design) out of it though!
  
  --
  Education is a better safeguard of liberty than a standing army.
  Edward Everett (1794 - 1865)
2. Re:3/4 LoC a night by Nuroman · 2007-01-09 20:23 · Score: 2, Interesting
  
  According to Google (how appropriate), 30 terabytes * 365.25 * 10 = 107.006836 petabytes.
Why Google? by Anonymous Coward · 2007-01-09 16:15 · Score: 1, Interesting

Just wondering if Google can provide the right tool. Yea, they can design a front end. Yep, they can give content. But can they really deliver the information you need w/o a whole pile of ebay ads?
1. Re:Why Google? by Ingolfke · 2007-01-09 17:20 · Score: 2, Interesting
  
  It says their smart enough to take on challenging and related problems that they can learn from and use to enhance their information business. This is a real-time application. Imagine if Google could, based on all of the data Google is collecting and indexing, provide a real time view of current trends and patterns of consumers on the web. An immediate zeitgeist presented in a way that a business can use to make sure it's selling its products at the right time to the right people. Cool stuff.
30 TB will be nothing in a few years. by Anonymous Coward · 2007-01-09 16:23 · Score: 0, Interesting

At the rate at which our storage capabilities are growing, 30 TB will be considered nothing. We're approaching consumer-grade hard drives with a capacity of 1 TB. We'll likely see 2 TB consumer-grade hard drives by the end of 2008. Remember, that's consumer-grade. Google will no doubt be able to afford far higher quality drives with larger storage capacities. And by 2017, 30 TB drives will be considered miniscule.

In 1997, 1 GB hard drives were the largest available for the average consumer. Now it's 2007, and we have 850 GB hard drives available in most tech retail stores. That's an 850x increase over a decade. It's likely we will see that trend continue over the next decade. So it's more than likely by the time this project is nearing its end, we'll be dealing with 700 TB hard drives, and that's at the low end of the market.
Near Earth Objects by Oddscurity · 2007-01-09 16:29 · Score: 2, Interesting

I saw a documentary not long ago about doing just this photographing of the same piece of sky, only with longer intervals than 30 seconds. Anything moving would automagically be flagged by the software, it's vector computed. Correct me if I'm wrong, but from what I can tell of this project, it's going to do exactly that (and more), but on a larger scope, and with better accuracy?

--
Indeed!
Re:Lots of data, but not as much as the LHC by mcelrath · 2007-01-09 19:30 · Score: 4, Interesting

The LHC will produce more data, but we also don't care about most of it. The vast majority of it is junk. The "interesting" physics (particles like W and Z bosons, top quarks, higgs, etc) are about 10^-9 of the events. It is a huge needle in a haystack problem and we throw out most data. We have many experts and professors who design "triggers" which, based on a subset of information that can be delivered to them in a reasonable time, decide whether a given proton-proton collison contains new physics. Many theorists these days are making dents in walls with their heads trying to think of ways these triggers might be missing important information, so that we can suggest changes before it's too late. This is a lot of dedicated silicon, FPGA's, VME crates, etc. Slashdotters should drool. Anyway, we throw out the vast majority of information.
By comparison, LSST is trying to store everything. Scroll up for an interesting comment about calibrating ambient brightness and seeing. I can't answer which will deliver more information, but both are incredibly interesting challenges.
Data challenges abound. We have designed the LHC Grid to distribute this information. There will be several data warehouses located around the world at national labs and universities. Even after the triggers decide what is "interesting", more sophisticated algorithms, with access to all the data in a single proton-proton collision are applied. Then, humans are applied to the data and we will try to dig out new signals from this.
In all this we expect to find (among other things) the origin of mass and Dark Matter, and we're working hard to prepare for the onslaught of data. :)
-- Bob

--
1^2=1; (-1)^2=1; 1^2=(-1)^2; 1=-1; 1=0.
Search this! by xebecv · 2007-01-09 20:20 · Score: 2, Interesting

Hm, Google searching space... I'm waiting for the time google will search in people's bodies and catalog their illnesses.
LSST v. PanSTARRS Approach by cmholm · 2007-01-09 20:32 · Score: 3, Interesting

The shop I'm at has been working the image processing and data storage problem for PanSTARRS, another sky survey project that is a bit further along (they have a test scope up and running on Maui). It's interesting to me that both projects are at once using conventional solutions and thinking outside of the box.

Conventional: LSST will use a single large telescope and detector; PanSTARRS (as it stands) intends to use a dedicated compute cluster for data reduction.
Novel: LSST is leaning towards distributing its data reduction task over Google's huge server farm; PanSTARRS will use four off-the-shelf 1.8m telescopes, each with a 1.4GP detector, mounted together to image the same piece of sky, and merging the overlapping images in post processing.

When I was working on the project, one of PanSTARRS requirements was to finish analyzing one night's viewing before the following sunset. Early on, the principal investigators decided to solve the image storage issue by not storing them permanently. Instead, once the science for a night's imaging had been extracted (astrometry, LEO or supernova detection, etc), the original images would hit the bit bucket. Whether they've stuck with that I don't know.

--
Luke, help me take this mask off ... Just for once, let me butterfly kiss you with my own eyes.