Creating The UniServer
bmongar writes " DrDobbs has an article about a project for a mirrored universal astronomy database. Jim Gray basically wants a netowrk of observatories around the world to publish their data and mirror other observatories' data. Basically creating a quadruple redundant system of data all avaliable online. He wants to create a new type of astronomer, the astronomer that is a data miner." As the article also says, the guy behind this is the guy behind the TerraServer as well.
Now, all we have to do is add a piece of Fairy Cake, a marker that says "You are Here", and ship the whole thing to Frogstar, and we're set....
Where are we going, and why am I in this handbasket?
this they wouldn't need so much storage...
People replying to my sig annoy me. That's why I change it all the time.
Well, as the guy is part of the Microsoft Research Team, I'd guess Whistler Version 7 running Access 2050 :)
----------------------------------- My Other Sig Is Hilarious -----------------------------------
Well there's an insightful AC.
It would help me if you pointed out what you thought was naive and why.
There is a whole gaggle of scientific work that at first seemed totally worthless commercially but eventually had commercial uses within 100 years or even much, much faster.
Seriously though -- if there really are lot of amateur astronomers out there snapping digital pictures of comets, would there be any benefit to creating an automatically indexed peer to peer server scheme?
His book with Andreas Reuter, Transaction Processing : Concepts and Techniques is terrific.
Sorry for all the inconvenience.
People replying to my sig annoy me. That's why I change it all the time.
It makes me wonder. Researchers usually have their own datasets, and they spend gobs of time working on them with their specialized prorams. It seems to me that the really valuable stuff out there is in these closed datasets, not in the everyday stuff that's available on the internet.
As an example, you can get a gazillion CD-ROM's with the Magellan data from Venus. But what good is that raw data? Not much. You'd probably want to get a look at the data on the Venus-geologist's computer instead, because it's been analyzed and selected and generally picked over to produce something meaningful.
If tits were wings it'd be flying around.
And finally, we are discovering universes
that are farther away and therefore younger than any we have previously discovered
I REALLY hope you meant to say galaxies instead of universes. I am a physics major with a healthy lean towards computational cosmology, and I would really really hope that if we had discovered entire new universes, I would know about it. SDSS, however, has discovered quite a few new galaxies, so I will assume that is what you were referring to
If you start out with mounds of raw data you aren't a scientist.
A scientist starts out with a hypothesis.
This is not true. The father of modern science, Francis Bacon, believed that science should be done by collecting as much data as possible and seeing what conclusions the data support.
Hypothesis driven research is actually in a sense cheating, because in such research the data gathered is biased -- the researcher is not considering all the data which could bear upon the situation but only those data which the researcher believes could support or refute a preconcieved hypothesis. Nevertheless, hypothesis driven research is the norm in science because until recently, that was the only efficient way to do science.
But with new techniques in data mining, we can begin to recapture the promise of Baconian science.
I couldn't agree more! There's nothing wrong with dreaming about Utopia. It's just that you'd have to eliminate approx. 99 percent of the human race to actually achieve it. Somebody deny the fact that for every Utopian dreamer there 33 pieces of shit and 66 ignorant morons running around...
People replying to my sig annoy me. That's why I change it all the time.
...root@omniverse.god?
...yhwh@creation.org?
...voice@burningbush.com
???
Constitutionally Correct
sorry I the kind of person who reads a lot of "Discover" articles, and so I know the basics but not many details. I did mean galaxies because I had just read an article on our discovery of galaxies that are older than previous thought, therefore challenging our current universe creation ideas.
I used to love astronomy as a kid. In fact, I got a scholarship and went off to Drake thinking I would major in it. However, I loved it more than it loved me, I think.
Anyway, while I was a kid, too poor to buy a telescope, I used to read astronomy books voraciously, and take notes on any stellar data I came across. Originally on ruled paper, I eventually transferred all of it to an AppleWorks db (on the Apple ][) in high school, and then into Excel (3? 4?) when I got to college. I used this to plot some very nice color H-R diagrams. This is the kind of project I really could have gotten excited about.
Constitutionally Correct
Check out the National Virtual Observatory (really should be International VO) . This is not a M$ project; it's a new effort among astronomical data centers to do a lot of what you're asking about.
-- tdk
Exit, pursued by a bear.
Isn't that a server in a small cabin in Montana that sends logic bombs to more technologically advanced servers?
No wait, that's the Un<b>a</b>Server...
Garg
Garg
Alumnus, Xavier's School for Gifted Youngsters
Astronomy, nitwit, not Astrology.
People replying to my sig annoy me. That's why I change it all the time.
Your average astronomer is already a major data miner. From the Hubble Deep Field to the images taken in the back yard with a home-built CCD camera, much of modern observational astronomy is entirely built around being able to mine those images for correspondance, object attributes, clustering in either position, colour, or some other feature. Even with a basic catalogue built off one single wavelength plate will assign position, size, brightness, orientation, semi-major and semi-minor size, positional error, orientation error, brightness error, isophotal brightness, local background level and half-a-dozen other attributes to each object in the catalogue. There may be several thousand objects in a single frame. Making sense of this data set requires time, some ideas about what you are searching for and some luck.
All that said, you'd be missing a lot as an astronomer if all you looked at was optical images. Going to other images for the same area of sky, be it infra-red, radio, x-ray and so on, will give you a deeper insight into the likely environment of your object and also into any likely confusions due to multiple structures along the line of sight.
So having a vast data repository is important, and astronomers have had the tools to go and query multiple surveys at multiple wavelengths for several years. So there is nothing new here either from a data access point of view. The only really new thing in this proposal is to collate all the data together onto four super-mirrors and ensure that these supermirrors remain in sync, so if one system dies, it can be restored from the other mirrors without having to go back to tape backups.
Cheers,
Toby Haynes
Anything I post is strictly my own thoughts and doesn't necessarily have anything to do with the opinions of IBM.
this seems that this guy wants to act as a real scientist and exchange discoveries...
an equal project has benn already made for genetics scientist's :
when you want to search if a sequence you discovered in a kind of genome (exemple a frog) is present in other kind of living creatures (like the well studied bacteria e.coli)
you post your sequence in a database (by the web) and it calculate the degree of similarity with sequences aldready disovered !
and even more it can tells you what this gene/sequence is "made for" in this organism...
a rare case of free community exchange in the great world of research...
it's quite funny to realize that people who are thought to be the best brains in the world act as little egoist rats that want to preserve what they won... this is why i quit the "BIG" biology research for the more exciting/funny/free research in computing...
i hope this guy will have te power an NRJ to go at the end of his project...
ptitom
.. as you can throw enough storage space at the problem. Just having a giant stockpile of data isn't going to be of much use (except for archival purposes) unless we also have efficient access to the data onsite (we don't want to send Terrabytes over the newtwork) and have the correct tools to allow different datasets to be compared and correlated. The possibilities for doing large scale data comparisons or comparing a wide range of wavelength datasets is surely what is most interesting here and the major point for having an online store (as opposed to data archive). I wonder what research tools are proposed?
----------------------------------- My Other Sig Is Hilarious -----------------------------------
Give me a break! Does this guy know anything about the field of astronomy from a professional point of view?
Most astronomers/astrophysicists don't spend the time looking through the telescopes themselves - the majority use data that someone else has already gathered. I agree that this would greatily increase their ability find pertinent data, however, it would hardly bring about a new 'type' of astronomer, the majority are already data miners.
UBU
Well considering that Astronomy is mostly a Unix world, I don't forsee M$ controlling the data. Anyway, these outputs in raw form are some of the most mind numbing series of numbers. There is little entertainment value in this data. This keeps it safe from the suits. Even SETI data is not really fun. It's the irrelevent graphics that make it entertaining.
(This should be listed under my name instead of Anonymous Coward)
Apart from having more observatories publish their data (most already do), having a central point to index it (not really here today, but if you want it you can generally find it - if it's not in the sky survey, it's not in the sky), and having M$ run things (please, no!), what does he hope to accomplish?
I love vegetarians - some of my favorite foods are vegetarians.
You'll find talk about data pipelines, " the grid ", and more. Of special note is that the technologies behind the actual efforts under way right now to create the NVO et al., are overwhelmingly based on Open Source technology and Unix. The fact that someone in Microsoft tries to jump on the bandwagon with what will presumably turn out to be a closed, proprietary solution, isn't really news.
-- This
Not really. Most data centers don't do much analysis on the data, they just provide it to astronomers who do. The wider the data can be cast, the more science can be squeezed out of it.
-- tdk
Exit, pursued by a bear.
So when Edwin Hubble plotted redshift vs distance of a bunch of galaxies and discovered that the universe is expanding, he wasn't doing science?
-- tdk
Exit, pursued by a bear.
Putting aside everything that could go wrong with a project like this (patenting, infrastructure, etc.) what about using two or more images of the same region of space taken at about the same time (or within 12 hours of each other, or whatever) to then extrapolate a finer detail that we could from the separate images? I understand this is how its done with the arrays of telescopes at some sites, and maybe it could be also used here.
Already done, see for example <a href="http://archive.stsci.edu/mast.html"> Multimission Archive at STScI </a>.
Sorry Bill, try again.
OverLord
if this Universe Database thingy starts spittin' out 42's left and right, I'll bloody well marry it. You don't get a more consistent partner than that...
People replying to my sig annoy me. That's why I change it all the time.
----
It might be just me, and maybe I'm paranoid due to too much Slashdotting, but this seems strange. I mean, all the Astronomers sharing their data with each other, working together on a mutual project. Next thing we know they'll be standing hand in hand singing "we all stand together!". Somehow the Utopian thought behind it makes my logical circuits sputter...
People replying to my sig annoy me. That's why I change it all the time.
It's called 'SETI at Home' isn't it?
A pizza of radius z and thickness a has a volume of pi z z a
Well, what I am interested about is that this would make it so that astronomers don't have to use a telescope. Remember, we only have one sky. For each image, there would be attributes for the time-date the image was taken, the celestrial coordinates it was taken at, the magnification, the geographical coordinates it was taken at, and perhaps even the weather conditions.
I don't see a reason why images that aren't in the visible spectrum can't be put into this database. Then you would need perhaps a spectrum range attribute.
The exciting thing in my opinion is what can be done with all this data. Imagine creating a starmap of the entire sky based on real observation, it may be zoomable at some points. Everytime a telescope takes a picture of the sky, it gets put into this database. That could yield a huge amount of data in relatively short time. I can very much see astronomers using this data instead their own observations. Imagine a "video" of the same part of the sky in twenty years.
This can be done from software if all the data is there. I know I would love this kind of thing to be publicly archivable. If I see something in the sky, I can then look onto the internet to see if there was any other images of it.
Sorry if my post is less than coherent, but this seems exctiting to me.
Um, I don't think this was a white paper. I think it was an idea...perhaps a proposal to astronomers.
I don't think it matters what OS they use or what database system they use, etc. etc. until they start implementing it.
I think the astronomers would very much appreciate this use of technology. It is one of the purist uses of technology I have known.
But I am interested in details as well though. So for those of you who specialize in this sort of stuff, how would you go about implementing this sort of system? Would GNU/Linux be able to handle it?
404 - Universe Not Found
Please contact the Universe Master at...
Hammer of Truth
I don't read artilces in Dr Dobbs anymore. It's a waste of time.
All you need in this life is ignorance and confidence -- and then success is sure. Mark Twain
The article definitely gets the ol' geek hairs on the back of your neck standing up. Petabyte backups, tape recovery that takes 5 days..
Lots of stuff that makes geek men howl.
However, it leaves out a *TON*. Like, what technology are they going to use to DO data mining? What database will run this monster? Which OS will it run on?
Further, what license/restrictions are there on the data once it gets published? Is it totally public knowledge, free of copyright?
Fundamental questions of large scope and size, not easily ignored.
However, the question *I* have is, why not do the data storage on online companies KNOWN for hosting data, instead of at astronomies, who have little experience at that.
GPL'd web-based tradewars themed space game
Although I'm not aware of any database of pure raw data, NASA at least have the Distributed Astronomy Library, described here, which is a repository of astronomical *information*. An example is here
Free Anne Tomlinson!!
*FUD start*Such thing reminds me of some M$ ideas on concentrating everything all around the world in one bucket. Somehow this resulted in the .NET idea. So now we are up to the Universe...*FUD end*
Well, anyway the idea is not so bad at all. But I don't see how to realise it without making some radical changes in the system. First we have to deal with communication channels. For such volumes like astronomical databases they are highly unreliable. We are not going to run pentabytes on them but surely there will be gigabytes going back and forth. Let's note. A Mars raw image from PDS weighs sometimes up to 20 Megabytes. Processing such images leads sometimes to data volumes 10-30 times bigger. On some cases it is possible to apply JPEG to compress these images. But sometimes it is highly undesirable to do it. So we get something weighing 100-200 Megs. On a 100Mb network, that will take a few minutes to pass from station to station. Now imagine a widespread, worldwide network working such way.
On one side we have archives all spread over the world. On the other side this rises a community of astronomers also working all over. It will be a big challenge to achieve such thing. And a big financial adventure. Maybe dumb burrocritters will think that data will be cheaper if it keeps rotting in a magnetic tape.
In effect, isn't this saying "we haven't found anything useful in all these terrabytes, want a copy?"
If the same approach was used with /., would it mean copying all the flamewars and troll posts? How much of a waste is that?
I'll keep reading at -1, looking for meaning, and let you know what I find.
It is happening in other sciences. For example, my field "bioinformatics" deals with analyzing molecular biological data, much of which is in public databases such as GenBank. Once experimental molecular biologists could be expected to analyze all their data themselves because there just wasn't very much of it. That just isn't true anymore.
This should have been implemented a long time ago, because the amount of information we are pulling in right now is tremendous and it will only increase with the release of the more and more satellites we send up. We need this database for three very important reasons
We are all concerned, due to recent movies, that we might get hit by an asteroid, which is a valid concern, so we need to carefully track the asteroids that we find because we are only currently searching 10percent of the sky. Secondly with newer and more powerfull telescopes we are mapping more and more planets outside our solar system everyday, soon they will role in by the dozens a day. And finally, we are discovering universes that are farther away and therefore younger than any we have previously discovered
jbischof
The biggest problem is, of course, data entry. A lot of the texts pose a challenge for OCR for a number of reasons, including the large number of special characters often used.
Another problem is people who insist on copyrighting and refusing to freely share their collections of online documents in the older languages, which is a real shame, because it prevents me from creating all kinds of interesting derived works (e.g. web pages of Old English texts where you can click any word to get information about it). It basically means that all this work has to be repeated by anyone who wants to make those texts freely available-- never mind that we're talking about works over 1000 years old!
If you would like every single scientist to make his own equipment, and perform every experiment from the ground up (going through all the previous experiments to validate the groundwork theory) then you're are missing the point of the scientific method alltogether.
From the scientific communities standpoint, wouldn't it make more sense for everyone to agree on a certain apparatus to collect the observational data, and then let everyone analyze the data on their own terms? We only have a handful of particle accelerators, however we have made serious progression in our scientific understanding by sharing their collective output data.
How do you come up with a hypothesis about 'something' if you don't even have a clue what defines the 'something' in the first place?
After the data is 'statistically' sifted through, we can then make up hypothesis as to how it appears to be the way it is, and then consolidate the theories. You can't make an experiment in astronomy! The fundamental basis of astronomy has always been a gathering of tons of data and sifting through it.
Don't waste the energy required to type if you don't got a clue about what the hell it is you're trying to discount.
-An Anonymous Coward Against the Unfounded Bashing of Astronomical Methods
Space escalator damnit!
Best Slashdot Co
DODS if for Oceanographic Data, but could be easily adapted to Astronomy Data.
In a nutshell, you put a CGI script on your server that maps out your database to standard format (Adapter pattern) and a web or desktop client
can preform queries against everyone who is using DODS on the server.
http://www.unidata.ucar.edu/packages/dods/
--Doug
I don't know if it was angular momentum that Kepler figured out from his data, but he did study Tycho's data.
:-) by using stellar precession, etc.
Tycho Brahe may not have been much of an astrophysist, scientist, or whatnot, but he was a hell of an observer, ESPECIALLY when you consider the crappy tools he had -- an eyeball, a sextant, and an optical telescope.
Scientists today still study his data, because there is so much of it, for such a long time, with such a high degree of accuracy. It's useful for all kinds of things; dating stars (or human events, like pyramid building
--
Do daemons dream of electric sleep()?
The NBC sports guy? That Jim Gray? The guy who never smiles? Figures--he was done after the Pete Rose thing, I guess.
--
-- Geof F. Morris
I'm no scientist, but I don't think they should use a lossful file format for this kind of thing.
Scientist: Hmm...what's this shady pixel on mars here? Could it be...could it be life!
Geek: Nahh...that just a result of the JPEG algorithm just making up pixels it lost in its compression algorithm.
All these archives are searchable from the web site, and (if you've registered with them) available for download. Images from HST and CADC are restricted to only the primary researcher(s) for a period of time (I think it's a year).
Putting multimedia data into the file system is the implementation strategy many commercial databases (including some versions of DB2) take behind the scenes for storing multimedia objects, even if they hide it behind a database API. They can still provide all the database facilities (transactions, indexing, access control, etc.) on top of such an implementation.
With that kind of architecture, you don't need a very powerful machine or high performance database to be able to serve image data at disk bandwidth or network bandwidth.
Wasn't it Kepler who looked over Brahe's work to work out his law of conservation of angular momentum?
--- It is not the things we do which we regret the most, but the things which we don't do.
There's an article out in Slashdot that pans the Space Station, but then gets into some actually interesting matter, like the increasing ability to actually do data mining. Data mining has long been a staple of hard science fiction, but the benefits of being able to /really/ do it are immense - less pollution, really clean data. There's just that nasty get-the-material to the factory issue. But that's why we need a space elevator, right?
Got Rhinos?
After all, astronomy is really just the gathering of lots and lots and lots of images and the analysis of those images. I'm surprised this didn't happen earlier actually. This allows for distributed analysis of all the images gathered at all the observatories, theoretically--imagine the power behind that. If other scientific fields could follow suit our progress would be accelerated greatly.
As usual, Microsoft is late to the party and comes with their own agenda. Microsoft products are oriented towards small business and desktop applications. That's what their evolution is driven by and that's what they are designed for. Whether this kind of data should be in a relational database is questionable to begin with. And it certainly doesn't need to be on an expensive, proprietary operating system and in a proprietary format.
Scientists already have excellent open-source tools to build long-term, stable, large-scale data collections. They would be foolish to tie research projects that can span decades to the fortunes of a company in the middle of a battle for the US business computing market, merely to gain some trinkets and give that company a publicity boost.
so that in case some of your data is corrupt
you can get it from a mirror rather than spend
five and a half years restoring from tape.
<groucho>thats the worst spell of universe i've seen in a long time </groucho>
Arm yourself with knowledge.
I attended the Virtual Observatories of the Future conference this past summer and would like to note that:
The take-home lesson from the Virtual Observatories conference was that the amount of data required to do science with a "virtual observatory" leads to interesting problems in computer science, problems which are only tractable when analyzed by collaborations between statisticians, computer science people, and the astronomers themselves.
Finally, note that this year's historic increase in the National Science Foundation budget is largely due to the new Information Technology Research Initiative. The need for new methods of data managment in the sciences is real.
This is nothing new. Astronomers already do this. In fact I just finished an assignment in my astro class which was 100% web data mining, and we have grad students here doing the same for their thesis.
I also find it rather stupid to make a "network of observatories". Perhaps microsoft.COM forgets, the world wide web was invented by physicists and astronomers for that purpose. The WWW _is_ this database he wants to "create". Someone's been learning from Al Gore I think.
And what's with MS trying to pretend they have a PARC by calling it BARC? *shaking head in shame*
Salsadot is the News for Mexicans site.
Hi! This is the Sig, blatantly attached to the end of this comment.