Slashdot Mirror


Creating The UniServer

bmongar writes " DrDobbs has an article about a project for a mirrored universal astronomy database. Jim Gray basically wants a netowrk of observatories around the world to publish their data and mirror other observatories' data. Basically creating a quadruple redundant system of data all avaliable online. He wants to create a new type of astronomer, the astronomer that is a data miner." As the article also says, the guy behind this is the guy behind the TerraServer as well.

34 of 81 comments (clear)

  1. Re:Welcome to Utopia by Jeff+DeMaagd · · Score: 2

    Well there's an insightful AC.

    It would help me if you pointed out what you thought was naive and why.

    There is a whole gaggle of scientific work that at first seemed totally worthless commercially but eventually had commercial uses within 100 years or even much, much faster.

  2. Is the creator a researcher? by PD · · Score: 2

    It makes me wonder. Researchers usually have their own datasets, and they spend gobs of time working on them with their specialized prorams. It seems to me that the really valuable stuff out there is in these closed datasets, not in the everyday stuff that's available on the internet.

    As an example, you can get a gazillion CD-ROM's with the Magellan data from Venus. But what good is that raw data? Not much. You'd probably want to get a look at the data on the Venus-geologist's computer instead, because it's been analyzed and selected and generally picked over to produce something meaningful.

    1. Re:Is the creator a researcher? by bmongar · · Score: 2
      You'd probably want to get a look at the data on the Venus-geologist's computer instead, because it's been analyzed and selected and generally picked over to produce something meaningful.

      That would be true only if you were interested in doing the same type of calculations. IF you wanted to do something different you may want to calculate differently on something he had thrown out or agregated in a way that ruined your calculations

      --
      As x approaches total apathy I couldn't care less.
    2. Re:Is the creator a researcher? by PD · · Score: 2

      Turing Award? How big was his dataset? I guess I should have been more clear.

      I meant to ask if he is a researcher that works with datasets larger than he can pull out of his ass?

      If he was, he'd figure out that a huge amount of data means something to just a couple people.

      He'd also figure out that researchers build their little kingdoms, and they are NOT going to want to contribute their data to the project.

      In short, the creator of this Astronomy database sounds like he doesn't understand the politics of the situation.

    3. Re:Is the creator a researcher? by PD · · Score: 2

      Right. Public access. Sure. I used to work for an agrochemical database compiler, and they used to have to sue researchers regularly to get the data. The researchers would promise to get the data to us, but it would get lost, or it would be late, or whatever.

      In business, and in science, the person who has the first crack at the data has the best crack at making a buck/making a discovery.

      Those cranky old bastards at universities weren't opposed to getting us data. They were just opposed to us getting the data in time to make money from putting it into a database.

      So, yet another situation where theory is different than practice.

      I still think that the person putting together this database is going to hit political resistance just as soon as his economy threatens the ability of astronomers to bring in grant dollars for their universities.

  3. Re:Why doesnt this exist already by MustardMan · · Score: 2

    And finally, we are discovering universes
    that are farther away and therefore younger than any we have previously discovered

    I REALLY hope you meant to say galaxies instead of universes. I am a physics major with a healthy lean towards computational cosmology, and I would really really hope that if we had discovered entire new universes, I would know about it. SDSS, however, has discovered quite a few new galaxies, so I will assume that is what you were referring to

  4. Re:This seems like the natural evolution of astron by Jonathan · · Score: 2

    If you start out with mounds of raw data you aren't a scientist.

    A scientist starts out with a hypothesis.


    This is not true. The father of modern science, Francis Bacon, believed that science should be done by collecting as much data as possible and seeing what conclusions the data support.

    Hypothesis driven research is actually in a sense cheating, because in such research the data gathered is biased -- the researcher is not considering all the data which could bear upon the situation but only those data which the researcher believes could support or refute a preconcieved hypothesis. Nevertheless, hypothesis driven research is the norm in science because until recently, that was the only efficient way to do science.

    But with new techniques in data mining, we can begin to recapture the promise of Baconian science.

  5. Observational Astronomers are already data miners by tjwhaynes · · Score: 4

    Your average astronomer is already a major data miner. From the Hubble Deep Field to the images taken in the back yard with a home-built CCD camera, much of modern observational astronomy is entirely built around being able to mine those images for correspondance, object attributes, clustering in either position, colour, or some other feature. Even with a basic catalogue built off one single wavelength plate will assign position, size, brightness, orientation, semi-major and semi-minor size, positional error, orientation error, brightness error, isophotal brightness, local background level and half-a-dozen other attributes to each object in the catalogue. There may be several thousand objects in a single frame. Making sense of this data set requires time, some ideas about what you are searching for and some luck.

    All that said, you'd be missing a lot as an astronomer if all you looked at was optical images. Going to other images for the same area of sky, be it infra-red, radio, x-ray and so on, will give you a deeper insight into the likely environment of your object and also into any likely confusions due to multiple structures along the line of sight.

    So having a vast data repository is important, and astronomers have had the tools to go and query multiple surveys at multiple wavelengths for several years. So there is nothing new here either from a data access point of view. The only really new thing in this proposal is to collate all the data together onto four super-mirrors and ensure that these supermirrors remain in sync, so if one system dies, it can be restored from the other mirrors without having to go back to tape backups.

    Cheers,

    Toby Haynes

    --
    Anything I post is strictly my own thoughts and doesn't necessarily have anything to do with the opinions of IBM.
  6. Storing the data isn't a problem... by clickety6 · · Score: 2

    .. as you can throw enough storage space at the problem. Just having a giant stockpile of data isn't going to be of much use (except for archival purposes) unless we also have efficient access to the data onsite (we don't want to send Terrabytes over the newtwork) and have the correct tools to allow different datasets to be compared and correlated. The possibilities for doing large scale data comparisons or comparing a wide range of wavelength datasets is surely what is most interesting here and the major point for having an online store (as opposed to data archive). I wonder what research tools are proposed?

    --
    ----------------------------------- My Other Sig Is Hilarious -----------------------------------
  7. Data Miner? by Mad+Hughagi · · Score: 2
    He wants to create a new type of astronomer, the astronomer that is a data miner

    Give me a break! Does this guy know anything about the field of astronomy from a professional point of view?

    Most astronomers/astrophysicists don't spend the time looking through the telescopes themselves - the majority use data that someone else has already gathered. I agree that this would greatily increase their ability find pertinent data, however, it would hardly bring about a new 'type' of astronomer, the majority are already data miners.

    --
    UBU
  8. Already being done... by Coz · · Score: 2
    The thing that ticks me off about this, is that it's already being done. The Digitized Sky Survey is a survey of all parts of the sky from a couple of authoritative sources. The Medium Deep Survey is Hubble data, gathered in a sort of parasitic mode (roughly analogous to how Seti@Home gets their data - but IANAAstronomer - that's an orders-of-magnitude oversimplification). BOTH are available for access over the Web.

    Apart from having more observatories publish their data (most already do), having a central point to index it (not really here today, but if you want it you can generally find it - if it's not in the sky survey, it's not in the sky), and having M$ run things (please, no!), what does he hope to accomplish?

    --
    I love vegetarians - some of my favorite foods are vegetarians.
  9. Yeah that's fine but... by AntiPasto · · Score: 4
    if this Universe Database thingy starts spittin' out 42's left and right, I'm headin' for the hills!

    ----

  10. Astronomer as data miner by Andy_R · · Score: 2

    It's called 'SETI at Home' isn't it?

    --
    A pizza of radius z and thickness a has a volume of pi z z a
  11. Re:Welcome to Utopia by karzan · · Score: 3
    The scientific community has always been one large, co-operative effort. This is only a technological enhancement of that. Granted, capitalism has contributed its fair share to science. But if science were based mainly in capitalism, we'd be in trouble--for one thing, how do you make money off astronomy?

    Scientists are already and have been for a long time working together, standing hand in hand. Maybe it seems Utopian from a selfish viewpoint but it's very natural to scientists.

  12. Re:Observational Astronomers are already data mine by extrasolar · · Score: 2

    Well, what I am interested about is that this would make it so that astronomers don't have to use a telescope. Remember, we only have one sky. For each image, there would be attributes for the time-date the image was taken, the celestrial coordinates it was taken at, the magnification, the geographical coordinates it was taken at, and perhaps even the weather conditions.

    I don't see a reason why images that aren't in the visible spectrum can't be put into this database. Then you would need perhaps a spectrum range attribute.

    The exciting thing in my opinion is what can be done with all this data. Imagine creating a starmap of the entire sky based on real observation, it may be zoomable at some points. Everytime a telescope takes a picture of the sky, it gets put into this database. That could yield a huge amount of data in relatively short time. I can very much see astronomers using this data instead their own observations. Imagine a "video" of the same part of the sky in twenty years.

    This can be done from software if all the data is there. I know I would love this kind of thing to be publicly archivable. If I see something in the sky, I can then look onto the internet to see if there was any other images of it.

    Sorry if my post is less than coherent, but this seems exctiting to me.

  13. Re:No real depth.. by extrasolar · · Score: 2

    Um, I don't think this was a white paper. I think it was an idea...perhaps a proposal to astronomers.

    I don't think it matters what OS they use or what database system they use, etc. etc. until they start implementing it.

    I think the astronomers would very much appreciate this use of technology. It is one of the purist uses of technology I have known.

    But I am interested in details as well though. So for those of you who specialize in this sort of stuff, how would you go about implementing this sort of system? Would GNU/Linux be able to handle it?

  14. Error... by smack_attack · · Score: 3

    404 - Universe Not Found

    Please contact the Universe Master at...

  15. No real depth.. by iamsure · · Score: 4

    The article definitely gets the ol' geek hairs on the back of your neck standing up. Petabyte backups, tape recovery that takes 5 days..

    Lots of stuff that makes geek men howl.

    However, it leaves out a *TON*. Like, what technology are they going to use to DO data mining? What database will run this monster? Which OS will it run on?

    Further, what license/restrictions are there on the data once it gets published? Is it totally public knowledge, free of copyright?

    Fundamental questions of large scope and size, not easily ignored.

    However, the question *I* have is, why not do the data storage on online companies KNOWN for hosting data, instead of at astronomies, who have little experience at that.

  16. Similar project by buttfucker2000 · · Score: 2

    Although I'm not aware of any database of pure raw data, NASA at least have the Distributed Astronomy Library, described here, which is a repository of astronomical *information*. An example is here

    --
    Free Anne Tomlinson!!
  17. Good but difficult by Ektanoor · · Score: 3

    *FUD start*Such thing reminds me of some M$ ideas on concentrating everything all around the world in one bucket. Somehow this resulted in the .NET idea. So now we are up to the Universe...*FUD end*

    Well, anyway the idea is not so bad at all. But I don't see how to realise it without making some radical changes in the system. First we have to deal with communication channels. For such volumes like astronomical databases they are highly unreliable. We are not going to run pentabytes on them but surely there will be gigabytes going back and forth. Let's note. A Mars raw image from PDS weighs sometimes up to 20 Megabytes. Processing such images leads sometimes to data volumes 10-30 times bigger. On some cases it is possible to apply JPEG to compress these images. But sometimes it is highly undesirable to do it. So we get something weighing 100-200 Megs. On a 100Mb network, that will take a few minutes to pass from station to station. Now imagine a widespread, worldwide network working such way.

    On one side we have archives all spread over the world. On the other side this rises a community of astronomers also working all over. It will be a big challenge to achieve such thing. And a big financial adventure. Maybe dumb burrocritters will think that data will be cheaper if it keeps rotting in a magnetic tape.

    1. Re:Good but difficult by harmonica · · Score: 2

      If the data has similarities in all those channels, maybe specialized lossless compression for astronomic images can be developed. Compression results always get better once you have a modeler designed specifically for your class of data.

  18. Re:This seems like the natural evolution of astron by Jonathan · · Score: 2

    It is happening in other sciences. For example, my field "bioinformatics" deals with analyzing molecular biological data, much of which is in public databases such as GenBank. Once experimental molecular biologists could be expected to analyze all their data themselves because there just wasn't very much of it. That just isn't true anymore.

  19. Why doesnt this exist already by jbischof · · Score: 2
    There is already too much information for one or two astronomers to keep by themselves.

    This should have been implemented a long time ago, because the amount of information we are pulling in right now is tremendous and it will only increase with the release of the more and more satellites we send up. We need this database for three very important reasons

    • Possible Collision w/Asteroid
    • Mapping New Planets
    • New Universe Discovery

    We are all concerned, due to recent movies, that we might get hit by an asteroid, which is a valid concern, so we need to carefully track the asteroids that we find because we are only currently searching 10percent of the sky. Secondly with newer and more powerfull telescopes we are mapping more and more planets outside our solar system everyday, soon they will role in by the dozens a day. And finally, we are discovering universes that are farther away and therefore younger than any we have previously discovered

    jbischof

  20. Re:Welcome to Utopia by Jeff+DeMaagd · · Score: 2

    Somehow the Utopian thought behind it makes my logical circuits sputter...

    I'm sure you are jesting, but anyhoo...

    There are people that are only motivated by money that can't seem to understand that not everyone is motivated by same. If everyone were motivated solely for financial windfall, would Linux exist at all?

    Outside of the "hacker" community, I believe that the academic and scientific type communities have contributed the most effort to Linux software in the first ten years (is it 10 years old yet? Maybe eight years), so it's not that much of a stretch. Scientific papers are about trying to share information in a hope furthering knowledge.

    People wanting to get master's and doctorates were able to contribute some effort on their thesis papers.

  21. Free collections of historical linguistic data by kurisuto · · Score: 2
    I'm glad to see the collection of huge, free data sets in astronomy. I very much want to do this kind of thing in my own field with Indo-European linguistic data. The little bits I've got so far are at: http://www.ling.upenn.edu/~kurisuto/germanic/langu age_resources.html

    The biggest problem is, of course, data entry. A lot of the texts pose a challenge for OCR for a number of reasons, including the large number of special characters often used.

    Another problem is people who insist on copyrighting and refusing to freely share their collections of online documents in the older languages, which is a real shame, because it prevents me from creating all kinds of interesting derived works (e.g. web pages of Old English texts where you can click any word to get information about it). It basically means that all this work has to be repeated by anyone who wants to make those texts freely available-- never mind that we're talking about works over 1000 years old!

  22. We still do this today by multipartmixed · · Score: 2

    I don't know if it was angular momentum that Kepler figured out from his data, but he did study Tycho's data.

    Tycho Brahe may not have been much of an astrophysist, scientist, or whatnot, but he was a hell of an observer, ESPECIALLY when you consider the crappy tools he had -- an eyeball, a sextant, and an optical telescope.

    Scientists today still study his data, because there is so much of it, for such a long time, with such a high degree of accuracy. It's useful for all kinds of things; dating stars (or human events, like pyramid building :-) by using stellar precession, etc.

    --

    --

    Do daemons dream of electric sleep()?
  23. Don't use JPEG by extrasolar · · Score: 2

    I'm no scientist, but I don't think they should use a lossful file format for this kind of thing.

    Scientist: Hmm...what's this shady pixel on mars here? Could it be...could it be life!

    Geek: Nahh...that just a result of the JPEG algorithm just making up pixels it lost in its compression algorithm.

  24. lots of multimedia data in open source databases by q000921 · · Score: 3
    The rational design for that kind of database is to put the image data into the file system and use the relational database for indexing and lookups. Most of the open source databases are perfectly up to the task of providing indexing for that kind of data. In fact, the amount of metadata in such applications is small compared to the kinds of data encountered in many commercial applications, so this is actually not even a particularly interesting benchmark for high-end database systems.

    Putting multimedia data into the file system is the implementation strategy many commercial databases (including some versions of DB2) take behind the scenes for storing multimedia objects, even if they hide it behind a database API. They can still provide all the database facilities (transactions, indexing, access control, etc.) on top of such an implementation.

    With that kind of architecture, you don't need a very powerful machine or high performance database to be able to serve image data at disk bandwidth or network bandwidth.

  25. Astronomers as data miners a tradition by Sxooter · · Score: 2

    Wasn't it Kepler who looked over Brahe's work to work out his law of conservation of angular momentum?

    --

    --- It is not the things we do which we regret the most, but the things which we don't do.
  26. Re:Welcome to Utopia by Zoyd · · Score: 2

    the academic and scientific type communities have contributed the most effort to Linux software in the first ten years (is it 10 years old yet? Maybe eight years)

    If you count from when emacs started being worked on in the mid 70s, the Linux software canon is about 25 years old.

    But the 1.0 kernel was released in mid 1994, so six years counting from then.

  27. Data Mining vs. Asteroid Mining by zpengo · · Score: 3

    There's an article out in Slashdot that pans the Space Station, but then gets into some actually interesting matter, like the increasing ability to actually do data mining. Data mining has long been a staple of hard science fiction, but the benefits of being able to /really/ do it are immense - less pollution, really clean data. There's just that nasty get-the-material to the factory issue. But that's why we need a space elevator, right?

    --


    Got Rhinos?
  28. Microsoft publicity by q000921 · · Score: 3
    I think this is mostly done for Gray and Microsoft to get publicity for their database. This is the continuation of TerraServer and other projects like that. Microsoft is trying to demonstrate "scalability" of their database and servers and to get it into the hardcore scientific server area.

    As usual, Microsoft is late to the party and comes with their own agenda. Microsoft products are oriented towards small business and desktop applications. That's what their evolution is driven by and that's what they are designed for. Whether this kind of data should be in a relational database is questionable to begin with. And it certainly doesn't need to be on an expensive, proprietary operating system and in a proprietary format.

    Scientists already have excellent open-source tools to build long-term, stable, large-scale data collections. They would be foolish to tie research projects that can span decades to the fortunes of a company in the middle of a battle for the US business computing market, merely to gain some trinkets and give that company a publicity boost.

  29. Re:Welcome to Utopia by NearlyHeadless · · Score: 2
    I'm not the AC poster, but the part that seemed naive to me was "Scientists are already and have been for a long time working together, standing hand in hand. Maybe it seems Utopian from a selfish viewpoint but it's very natural to scientists."

    Scientists are sometimes co-operative, sometimes bitterly competitive. Sometimes they share their data, sometimes they guard it jealously. Sometimes they go to great lengths to sneak a look at each other's data.

    For an exampe, see The Double Helix by James Watson, where Watson and Crick win a Nobel prize, partly by gaining access to Rosalind Franklin's X-ray pictures of DNA.

  30. Re:Welcome to Utopia by Jeff+DeMaagd · · Score: 2

    You're painfully naive if you think the opportunists who've taken over academia in the last generation and a half are anything but self-serving parasites.

    Finally something I can work with, at least more informative than just accusing someone as being naive without doing the littlest thing to fix the problem. I had forgotten about this.