Computer Science Tools Flood Astronomers With Data

Blah blah blah. by Anonymous Coward · 2011-07-19 13:13 · Score: 0

All the data storage capacity in the world cannot change the fundamental laws of optics, the speed of light, or the coming cutting by the government of scientific research spending.

Re:Blah blah blah. by GumphMaster · 2011-07-19 16:49 · Score: 1

Actually, the speed at which science funding migrates from one flavour-of-the-month to the next clearly exceeds the speed of light. If we could turn that speed violation into workable time travel we could start processing the data mountain (astronomy data is not alone here) about 3000 years ago so that it is complete by lunch-time Sunday.

--
Patent litigation: A doctrine of Mutually Assured Destruction... in which everyone seems willing to push the button
Re:Blah blah blah. by RockDoctor · 2011-07-24 01:15 · Score: 1

Actually, the speed at which science funding migrates from one flavour-of-the-month to the next clearly exceeds the speed of light.
That's the speed of illumination you're talking about, not the speed of light.

--
Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"

I gots to nitpick by Anonymous Coward · 2011-07-19 13:31 · Score: 1, Insightful

methinks its the sensors that are doing the flooding. Not "computer science tools"

Re:I gots to nitpick by Fourier404 · 2011-07-19 17:43 · Score: 1

The sensors wouldn't be picking up anything interesting if they weren't automatically being pointed at interesting things. There aren't enough astronomers to do the pointing manually.
Re:I gots to nitpick by mwvdlee · 2011-07-20 01:03 · Score: 1

Yeah, but the "computer science tools" wouldn't know what were interesting things unless the astronomers tell them. So basically this is just the astronomers flooding themselves.

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Re:I gots to nitpick by uninformedLuddite · 2011-07-21 16:50 · Score: 1

Yeah, but the "computer science tools"
all sit together at the back of the class

--
The new right fascists are bilingual. They speak English and Bullshit.

They might be annoying, by Anonymous Coward · 2011-07-19 13:33 · Score: 1, Funny

But I think "tools" is a bit offensive. They're trying to help the astronomers in a meaningful way.

Re:They might be annoying, by Anonymous Coward · 2011-07-20 00:05 · Score: 0

C'mon mods! That was freaking hilarious! Long time since I literally laughed out loud on /. :D

too much? by danbuter · 2011-07-19 13:37 · Score: 2

My biggest issue would be if there is too much information. What if the scientists are using the wrong search queries and missing something important? Or maybe something important is just buried on page 931 of a 2,000 page data report. Still, it's better than the opposite problem, of just not having the data to search.

Re:too much? by SoCalChris · 2011-07-19 14:14 · Score: 4, Insightful

There's no such thing as too much data in a case like this, assuming that they can store it all. Even if it's too much to parse now, it won't be in a few years. Get as much data as we can now, while there's funding for it.
Re:too much? by DigiShaman · 2011-07-19 15:21 · Score: 2

Disk I/O and the ability to backup that data can be a bitch. Especially if the delta changes overlap within a 24-hour period. Of course, there are ways of addressing this problem with multiple servers, but that comes at a financial cost. Also, SAN and DAS technology still lags behind in I/O compared to the explosive growth in storage capacity.
Personally, I have clients that deal with 30+ TB worth of science data. Data retention is a major headache for me because as of four years ago, they only needed 2TB of storage. I can't keep up with their needs without the EqualLogic or similar enterprise solution route.
Google. Please throw us a bone here. We could use a software solution that's both manageable, non-proprietary, and will scale with off-the-shelf hardware. Ya, I know. I'm asking for a lot here. :(

--
Life is not for the lazy.
Re:too much? by DerekLyons · 2011-07-19 17:27 · Score: 2

What if the scientists are using the wrong search queries and missing something important? Or maybe something important is just buried on page 931 of a 2,000 page data report?
Which is pretty much the same problem astronomy has had since roughly forever... Looking in the wrong place. Looking at the wrong time. Looking in the wrong wavelength. Look for the wrong search terms. Looking on the wrong page... It's all pretty much the same.

The sky and the data will be there tomorrow and they'll try again. Just like they always have.
Re:too much? by Anonymous Coward · 2011-07-19 20:48 · Score: 0

Google? They could also look to their colleagues doing particle physics - we also have humongous amounts of data (which is why we use the grid)
Re:too much? by mwvdlee · 2011-07-20 01:05 · Score: 1

30TB a night, for a single telescope. The cost of storing such amounts of data would be astronomical *wink*.

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Re:too much? by ginbot462 · 2011-07-20 04:33 · Score: 1

For a lot of things .. but obviously not all. But, the concept that you will catch that one transient that will help you is astronomical as well.

--
Atlas Shrugged : Thematic Story :: Battlefield Earth : Organized Religion

True in all fields by eparker05 · 2011-07-19 13:38 · Score: 3, Interesting

Many sciences are experiencing this trend. A branch of biochemistry known as metabolomics is a growing field right now (in which I happen to be participating). Using tools like liquid chromatography coupled to mass spectrometry we can get hundreds of megabytes of data per hour. Even worse is the fact that a large percentage of that data is explicitly relevant to a metabolomic profile. The only practical way of analyzing all of this information is through computational analysis, either through statistical techniques used to condense and compare the data, or though searches on painstakingly generated metabolomic libraries.

That is just my corner of the world, but I imagine that many of the low hanging fruits of scientific endeavor have already been picked, going forward, I believe that the largest innovations will come from the people willing to tackle data sets that a generation ago would be seen as insurmountable.

Re:True in all fields by Jah-Wren+Ryel · 2011-07-19 14:09 · Score: 1

Many sciences are experiencing this trend.
Yes, the piracy sciences have been particularly hard hit. Modern piracy enegineering can easily generate the equivalent of 10 blu-rays, or 500 gigabytes, per day. Modern data reduction tools such as x264 have been developed to deal with this data overload, and can frequently reduce a 50GB bluray by more than 10:1 down to 8GB or less without a significant loss of information in the processed data.

--
When information is power, privacy is freedom.
Re:True in all fields by Anonymous Coward · 2011-07-19 14:39 · Score: 0

How is that a bad thing? Sounds like you're trying to tackle the problem of compression without significant loss in quality.
Re:True in all fields by geekatech · 2011-07-19 14:49 · Score: 1

We got Bioinformatics, now what would this field be called? Astroinformatics? The Square Kilometre Array project is another example of this.
Re:True in all fields by etherelithic · 2011-07-19 15:20 · Score: 1

Hm, small world--I'm also in metabolomics (more on the computational end than the biological side of things, what I like to call computational metabolomics). I was going to write a post similar to your own, but more generalized for those who aren't familiar with the biology behind it. The issue now is that well established informatics/statistical/computer science approaches are used as general tools in biology/astronomy/biochemistry, and there is a great need to formulate novel algorithms to take advantage of the particular idiosyncrasies of their respective data sets. Otherwise you end up losing a lot of valuable information. The word "interdisciplinary" is fairly abused in academia, but it really does apply in the case of these emerging computational/informatics approaches to classical fields of biology/astronomy/etc. We need people who are equally trained in both biology/astronomy/etc and computer science/informatics to really make the revolutionary leaps in their respective fields.
Re:True in all fields by Anonymous Coward · 2011-07-19 18:00 · Score: 0

Astroinformatics?
Exactly. I went to a series of seminars earlier this year that was described as an astroinformatics school.
Re:True in all fields by Anonymous Coward · 2011-07-19 18:37 · Score: 0

The beauty of convergence... well put sir.
Re:True in all fields by mwvdlee · 2011-07-20 01:10 · Score: 1

I download Linux distro torrents faster than "hundreds of megabytes per hour".
At that speed, a full day's worth of data is only a few GB, or roughly 10,000 less than discussed in TFA.
Still, analysing even a few GB of data a day is no task for mere men.

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Re:True in all fields by Xest · 2011-07-20 01:51 · Score: 1

"Still, analysing even a few GB of data a day is no task for mere men."
Unless it's a word document or power point presentation in which someone has embedded an uncompressed video or bunch of uncompressed images. Then you can get through it in about 5 minutes flat, not counting the half hour it takes Word/Powerpoint to load.
No, in all seriousness though, it really depends what the data is. That's why I'm not keen on this arbitrary "many gigabytes of data" metric which articles like this are supposed to wow us with. Really, what is the data? how is it stored? Is it quick and easy for computers to process and produce intelligible results? Is it easily and quickly human decipherable? A few GB of data may well just be a high definition video of say, an hour long event somewhere in the universe that can be sped up 10 fold and watched in no time at all, to gather everything we need to know from it for all we know. If a supernova takes a day to occur in realtime, but can be compressed down to a 10 minute video or even a number of important stills for scientists to gather all they need from it, then even a terrabyte of data is unimpressive, compared to other data that really does require trawling through bit by bit.
So really, it all depends what the data is, and how easy it is to represent and digest, rather than the inherent amount of data.

If you'd like to help with all that data... by Anonymous Coward · 2011-07-19 13:40 · Score: 0

Try GalaxyZoo.com, it uses the data from the mention Sloan Digital Sky Survey mentioned in the article to supply information for any user that signs up to view it. They give you a windowsill position. So to say they give you an open window inside the world of astronomy and let you supply your own inferences or generally process the data as they have bottlenecked for the terms of a white paper.

Plenty of items in the array from planet hunting to spectroscopic analysis. Given, those fit in the same overflow on the venn diagram, of course.

Re:If you'd like to help with all that data... by NoNonAlphaCharsHere · 2011-07-19 14:16 · Score: 3, Interesting

Annnnd... we have a winner. GalaxyZoo uses tens of thousands of underutilized, superfluous, non-specialized 'carbon units' for pattern recognition, which they're really really really good at, that is, 800mS after looking at an image -> elliptical, spiral, irregular... "Hmmm, hey, that's funny... wait... WTF --- let's post this to the forum, where hundreds of other random carbon units will weigh it, and a For Really Astronomer(TM) will be checking it out inside 24 hours if it creates enough buzz..." see Hanny's Voorwerp for the quintessential example.

Software that could 'be surprised' would be nice, but it's a long, long way off.

Never Too much Data by Sinthet · 2011-07-19 13:53 · Score: 1

I'm not an expert in Astronomy, but in general, I don't think you can collect too much data, as long as its stored in an at least somewhat intelligible format. This way, even if professional astronomers miss something today, amateurs and/or future astronomers will have tons of data to pick apart and scavenge tomorrow.

Plus, more data should make it easier to test hypotheses with more certainty. Hopefully, the data will be made publicly available after the gatherers have had a shot or two at it.

Re:Never Too much Data by Anonymous Coward · 2011-07-19 18:27 · Score: 0

Hopefully, the data will be made publicly available after the gatherers have had a shot or two at it.
I don't know about optical astronomy, but in radio astronomy this is pretty standard. For the telescopes I currently use, you've got the data to yourself for 18 months before it's publicly released. For next-generation telescopes, it should be public as soon as it's been processed and verified (a few days).

30 TB? Oh my! by Anonymous Coward · 2011-07-19 13:55 · Score: 0

That's a tremendous amount of data to get from a telescope that hasn't even been built yet.

FIFO by Anonymous Coward · 2011-07-19 13:56 · Score: 0

"the Large Synoptic Survey Telescope, for instance, generates 30 terabytes of data each night."

You know what they say, garbage in, garbage out...

I have no idea if that telescope is garbage or not, but I do know that if we keep canceling new telescope development then we will quickly be left with just the garbage.

Google writ small... by Byrel · 2011-07-19 14:00 · Score: 1

30TB per day works out to about 10 petabytes per year. If you compare this to the total amount of data produced in a year (from all human sources), around a zetabyte, it's not that huge. In fact, IIRC, the yearly transfer rate of the internet is around 250 exabytes. The people with the really hard job of data processing are internet search engines. Not only do they have to through several orders of magnitude more data, they have to do it faster, and with much less clearly defined queries.

I sometimes wonder how generally useful something like Google's page rank system is. It might be possible (if Google ever runs out of other things to do :) to apply this to arbitrary scientific datasets. This could tremendously speed such calculations. Unfortunately, it may be a while before it is possible for any of the major search engines to release significant parts of their algorithm without being at a serious disadvantage competitively.

However, there is another obstacle as well; one dataset doesn't cross reference itself anywhere near as much as the internet, and it is fairly certain that Google (at least) uses this for a good part of its ranking. So we would also want to incorporate the opinions of scientists (both amateur and professional), and many datasets, to give the pagerank system the detail it would need.

Re:Google writ small... by Anonymous Coward · 2011-07-19 14:06 · Score: 0

However, there is another obstacle as well; one dataset doesn't cross reference itself anywhere near as much as the internet, and it is fairly certain that Google (at least) uses this for a good part of its ranking.
taking advantage of the fact that pages link to each other is the entire idea behind the PageRank algorithm.
Re:Google writ small... by Anonymous Coward · 2011-07-19 14:21 · Score: 0

So who else is producing 10 petabytes of *original* *unique* data annually if this is such small potatoes?
Re:Google writ small... by NoNonAlphaCharsHere · 2011-07-19 14:29 · Score: 1

So who else is producing 10 petabytes of *original* *unique* data annually if this is such small potatoes?

What're you, Dan fucking Quayle???

OK, OK, It's totally a "get off my lawn joke"...
Re:Google writ small... by blueAt0m · 2011-07-19 15:32 · Score: 1

Sounds like another task for IBM's Watson. The way I understand the problem, most scientists must be in cohorts with skilled CS folk to generate the types of answers from such large datasets, or they must be half cs folk themselves in order to traverse such scales of data. Quite an undertaking when professionals should be focused in one area. Let alone conveying the ideas of either field to the other how they themselves see/understand it. However the dawn of asking Watson or Enterprise to figure something out using some NLP fun should manifest some discoveries faster, if this were the case. Us h.sapiens are great with abstract stuff... leave the crunching to the comps. The goal is to close that gapping pigeon language to full blown comprehension with our binary buddies. Then data, "schmeta"
Re:Google writ small... by arkane1234 · 2011-07-19 15:48 · Score: 1

Except... that's the right way to spell it... it's not "potatos".

--
-- This space for lease, low setup fee, inquire within!
Re:Google writ small... by Anonymous Coward · 2011-07-20 00:50 · Score: 0

The spelling is correct. You're the one who's wrong. So the next time you try to act so smug or high and mighty just remember that you fouled up the spelling of a word most 3rd graders can spell. And you couldn't even do that with a touch of class. That makes Dan Quayle a friggin' brain trust compared to you.

Slashtards always love to strut their stuff but more and more often they're proving that they're just dumb asses in the guise of the wise and learned. What's even worse is that so few of you know how to hold you head up high and admit your mistakes like an honest man would.

Oh, if you think you're some kind of village elder for knowing who Dan Quayle is it just shows me even more how much of your persona is a pretense. By most people's standards you're probably still wet behind the ears.

informatics? by countertrolling · 2011-07-19 14:11 · Score: 1

For some reason, that word scares me..

--
For justice, we must go to Don Corleone

Generates? Wrong tense. by oneiros27 · 2011-07-19 14:24 · Score: 5, Informative

*WILL* generate. LSST isn't operating yet.

And yes, 30TB is a lot of data now, but we have some time before they finally have first light.

Operations isn't supposed to start 'til 2019 : http://www.lsst.org/lsst/science/timeline

We just need network and disk drive sizes to keep doubling at the rate they have, and we'll be laughing about how we thought 30TB/night was going to be a problem.

SDO finally launched last year with a date rate of over 1TB/day ... and all through planning, people were complaining about the data rates ... it's a lot, but it's not insurmountable as it might've been 8 years ago, when we were looking at 80 to 120GB disks.

Although, it'd be nice if monitor resolutions had kept growing ... if anything, they've gotten worse the last couple of years.

(Disclaimer : I work in science informatics; I've run into Kirk Bourne at a lot of meetings, and we used to work in the same building, but we we deal with different science disciplines)

--
Build it, and they will come^Hplain.

Re:Generates? Wrong tense. by Carnivore · 2011-07-19 15:09 · Score: 4, Informative

In fact, they just started blasting the site. I actually live next door to the LSST's architect, which is pretty cool.

Astronomers generate a tremendous amount of data, bested only by particle physicists. Storing it all is a challenge, to put it mildly. Backup is basically impossible.
The real problem is that the data lines that go from the summit to the outside world are still not fast. The summits here are pretty remote and even when you get to a major road, it's still in farm country. And then getting it out of the country is tough--all of our network traffic to North America hits a major bottleneck in Panama, so if you're trying to mirror the database or access the one in Chile, it can be frustratingly slow.

Useless information by Anonymous Coward · 2011-07-19 16:01 · Score: 0

Astronomers are used to taking pictures and storing them, but that doesn't mean that it is the best way to operate.

The fact that you can store the data, doesn't mean that you have to or should do so. Why not capture it again when it is needed again, the way other monitoring systems do?

Re:Useless information by slackbheep · 2011-07-19 19:10 · Score: 1

Because time travel is a bigger issue than storage space? Why don't you try taking a picture of yesterdays sunset, and get back to me?
Re:Useless information by Anonymous Coward · 2011-07-19 20:40 · Score: 0

Yesterday's sunset, or indeed the sunset from a few billion years ago, is not significantly different from tomorrow's sunset or indeed the sunset from a few billion years hence. So there is no sense in saving all that data. Just gather it when you need it, do your analysis, keep the resulting information and discard the data. Data != Information.
Re:Useless information by Anonymous Coward · 2011-07-19 21:18 · Score: 0

Astronomers frequently precover asteroids in old plates. And having a big arc of observations is very valuable to compute a reliable orbit..
Re:Useless information by csrster · 2011-07-19 21:26 · Score: 1
This is a really poor argument for several reasons:
- i) telescope time is a scarce resource. If I need an image of a galaxy X I might have to wait years to get telescope time for it. If galaxy X has already been observed once and the data stored then I can do my new research (e.g. datamining) on the existing data. Nobody knows in advance which data is going to be interesting to future researchers so triage is almost impossible.
- ii) telescopes have finite lifetimes. Once the telescope/instrument ceases to exist the data cannot be reproduced.
- iii) Most of the interesting things in the universe are dynamic. You need to be able to compare observations of stuff over time.

Re:Generates? Wrong tense. by Anonymous Coward · 2011-07-19 16:16 · Score: 0

As far as I understand it, the data will be available also to the general public. I assume that means they will need to have a global network of caches?

Re:Generates? Wrong tense. by dkf · 2011-07-19 20:59 · Score: 1

Astronomers generate a tremendous amount of data, bested only by particle physicists.

Earth scientists will merrily generate far more — they're purely limited by what they can store and process, since deploying more sensors is always possible — but they're mostly industrially funded, so physicists and astronomers pretend to not notice.

--
"Little does he know, but there is no 'I' in 'Idiot'!"

Re:Generates? Wrong tense. by dkf · 2011-07-19 21:01 · Score: 1

As far as I understand it, the data will be available also to the general public. I assume that means they will need to have a global network of caches?

Possibly. It depends on how much the general public actually wants to download the data; if it is just selected images instead of the bulk (most of which will be boring "not much happening here" stuff) then serving it from a single site will be quite practical.

--
"Little does he know, but there is no 'I' in 'Idiot'!"

Re:Generates? Wrong tense. by Shag · 2011-07-19 21:14 · Score: 1

*WILL* generate. LSST isn't operating yet.

This, unless they have a time machine. ;)

The first Pan-STARRS scope with its 1.3-gigapixel camera has been doing science for a little while now, and I think it might do something like 2.5TB a night. That's still a lot of disk (and keep in mind that they originally planned to have 4 of those scopes), but I think their pipeline reduces it all to coordinates for each bright thingy in the frame and then throws away the actual image (though I could be wrong).

Where I work, our highest-resolution toy is 80 megapixels right now, but we're supposed to get a shiny new one next year with a FOV three times wider and close to a gigapixel of resolution... that'll chew through disk and bandwidth like crazy.

--
Village idiot in some extremely smart villages.

Re:Generates? Wrong tense. by csrster · 2011-07-19 21:18 · Score: 1

Theoreticians surely generate most because they're only limited by how far a CPU can churn out floating-point numbers.

Do astronomers compress? by wisebabo · 2011-07-19 21:44 · Score: 1

Ok, I know this doesn't solve the problem of actually ANALYZING the data but for storing and moving the data around, what's the best compression algorithm for astronomical (I mean the discipline, not the size!) data.

I used to work for a company that developed a really good compression algorithm using wavelets. At the time it was the only one to be accepted by A-list movie directors (the people with the real power in Hollywood); they refused to go with any of the JPEG or MPEG variants (this was before JPEG 2000 which I understand also uses wavelets). We pitched to JPL the idea that they use some of this technology for some of their mission imaging requirements but they said the data was almost priceless and they couldn't risk losing any data an admittedly "lossy" compression algorithm. Of course they were forced to break this policy with Galileo because the main antenna never opened so only had a tiny fraction of the bandwidth that they originally planned. (Interestingly enough after the Columbia disaster, our equipment was later heavily used by NASA for imaging requirements related to observing the space shuttle. It helped make the conversion to a digital workflow practical which really sped up the time needed to distribute Hi-Res launch videos to all the NASA engineering sites around the country.)

So do astronomers use some lossless compression algorithm? In the case of space based data collection, do they have the computer power to compress it on board? Do they "clean up" the images first to make it easier to compress?

Re:Do astronomers compress? by dargaud · 2011-07-20 00:53 · Score: 1

Do they "clean up" the images first to make it easier to compress?
Normally they don't. Compression algorithms, almost by definition, create artifacts that are difficult if not impossible to distinguish from potentially interesting data. So science imagery is almost always saved in 'raw' format, unless you have no other option like with your Gallileo example. Imagine applying a dead pixel detection to an astronomy image: 'poof!', all the stars magically disappear!

--
Non-Linux Penguins ?
Re:Do astronomers compress? by mwvdlee · 2011-07-20 01:32 · Score: 1

Not all compression algorithms are lossy, though the lossless ones aren't nearly as space-efficient.
But some form of lossy compression might work too; it would be easy to filter the images so, for instance, any "nearly-black" pixel is set to black. Add some RLE and you have compression.
The key to lossy compression is having a way to determine what type of data isn't as important and approximating that data.

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Re:Do astronomers compress? by dargaud · 2011-07-20 02:10 · Score: 1

The key to lossy compression is having a way to determine what type of data isn't as important and approximating that data.
The problem with research is that until you've looked, you don't know what you are looking for...

--
Non-Linux Penguins ?
Re:Do astronomers compress? by Anonymous Coward · 2011-07-20 05:32 · Score: 0

Is a "nearly-black" pixel nearly black because of a stray photon that shouldn't be there, or because of a photon from an interesting object farther away than you expect? It's hard to be certain where you should draw the line as in the future these nearly-black pixels might be important.
Re:Do astronomers compress? by mwvdlee · 2011-07-20 07:57 · Score: 1

Which is why I said "for instance". I don't know what the researchers are looking for, but I'm pretty sure the researchers themselves have a decent understanding what data they want. In contrast to what dargaud mentioned above, most researchers set out to find specific data to prove or disprove a theorem; they only a specific subset of all data collected. Very few researchers try and discover things in a random set of raw data.
If all you want to know is the amount of stars in a specific picture, you can keep the count and throw away the picture.
Perhaps they only need the coordinates of stars or keep clear objects of atleast 10x10 pixels in size.

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?

Re:Generates? Wrong tense. by mwvdlee · 2011-07-20 01:20 · Score: 1

Deploying more telescopes is always possible as well.
This isn't a race about who can fill up storage space the quickest.

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?

Data is a lot like garbage. by Anonymous Coward · 2011-07-20 01:30 · Score: 0

“Data is a lot like garbage. You need to know what you are going to do with it before you start collecting it.” - Mark Twain

Re:Generates? Wrong tense. by ginbot462 · 2011-07-20 04:39 · Score: 1

At least you aren't at Dome A. You might would have to you some tropospheric (to no pay outrageous SAT usage rates).

--
Atlas Shrugged : Thematic Story :: Battlefield Earth : Organized Religion

Slashdot Mirror

Computer Science Tools Flood Astronomers With Data

60 comments