National Virtual Observatory
scubacuda writes "According to this Technology Review article, U.S. astronomers (compliments of a $10M grant from the National Science Foundation) are building a National Virtual Observatory to make accessible terabytes of astrononomical data to a web browser. One interesting challenge is how the scientists are going to query so many *different* distributed databases (which they're leaving in their respective places to avoiding clogging network bandwidth)."
I virtually built a tetrabyte virtual observatory, but something good was on tv.
"Dr. Quinn Medicine woman...Is there anything she can't do?"-Homer
No, no. It should be renamed the National Space Wallpaper Archive.
If Mr. Edison had thought smarter he wouldn't sweat as much. --Nikola Tesla
good at answering the question.
m. felzien
Nuff Said.
"History has shown us that the greatest leaps forward have occurred not when you observe the universe through just one window, but when you compare the views of the universe obtained through different windows," says Ray
Thanks Ray for that endoursment of that (in)famous MS product.
This is offtopic, please mod it down.
I tried to look at the universe through my web browser but all I saw was this prompt that told me I should update my web browser to the latest version in order to see the universe.
So will the universe be viewable in the next point release or is it several years away.
Is it possible to look at the universe with, say, lynx?
Or if that is not possible with javascript turned off?
IMHO this is an incredible phenomenon. For the first time in history, we have been able to access a huge subset of academic literature and data for the (fairly minimal) cost of an internet connection... Many university lecture course notes are completely available on the WWW. The internet could prove to be the single factor which contributes greatest towards equality of educational opportunity for all around the world. Will education will lead to (economic) salvation?
From what the article reads, it seems to be a very ambitious and interesting project. Very rarely do you see people trying to get together to spread information out to the web in such a fashion. The major problem in my (and I can imagine in their) mind is of format? How can you accomodate the mythical layman's and his or her inherent lack of skill, and still have it be available for advanced researchers to make use of.
/numbers being available for public research. Maybe someone will throw together an inverse Terraserver or something with Whiz-bang true-layman appeal. Until then, the geeks bow at the effort, because man, space is BIG.
It seems that there is simply going to be a huge amount of data-cross referenced and collated. From the second page of the article, it seems to include pictoral data. I also hear talk of XML being thrown around, which is a good start, but there's a lot that goes into that transition. Are they looking to set the layman bar at "your novice astronomer", "the third grade science report", or "grad student". Where is this information really being targeted at the sub-obscure level.
While I don't want to trivialize their massive IT effort, it seems that a lot of this is going to come down to the end user of the data. Their sample study using this information isn't trivial stuff, and does seem to set the aforementioned bar at somewhere in the undergrad-graduate level. Perhaps that is the nature of the data (I'm not that familiar with it). There's an XML schema, some request examples, and other framework stuff already in place to view by potential client writers.
I'm glad to see XML being done the right way (by collaboration with its end users), and those pictures
Anyone closer to the project know of any simplification efforts?
--jaybonci
That leaves me wondering: other than satisfying curiosity, will people actually do anything useful with this data? Will this just include "images" or will there actually be a lot of spectrographic data and other measurements? What would they be looking for? What might they find?
Overall, I guess I just don't see yet that this is a useful use of scarce research funds.
Can I search for p0rn in the universe?
Yes, but it may affect your karma, depending on who you listen to.
the SkyView Virtual Observatory run by NASA, though I suspect this National one will be far more sophisticated. Cheers.
All in all, though, it seems like a good use for those tax dollars. The "Google" of astronomy research is an attractive idea, and I know we'll get some great new acronyms in the deal.
Everything I've ever learned the hard way was based on a statistically invalid sample.
Efforts like this are very good. It is good to see government agencies marry the cheap delivery of the internet to their huge datasets.
... spacey.
And, appropriately enough, the text on their page is quite
The Army Understands
Jim Gray, at Microsoft Research, has coauthored papers on this topic with at least one of the researchers mentioned in the article. There is some really good reading at:
P ag eSummary.htm
http://research.microsoft.com/~Gray/JimGrayHome
Alan.
I don't know how much data they are actually talking about, but I can offer up a solution.
Some of you might disagree. I've run into a scalable piece of software which will interogate all their information sources irregardless of their storage format, index them, and still leave them all in their respective locations.
Autonomy Inc. has a product called DRE AXE which is also XML compliant. They have a pretty simple API to work with and have even seen it work on Java, PHP, and Perl. The query engine is extremely fast, and supports laymans terms. The engine supports both Boolean as well as natural language queries. Check them out, i've been administering their products for about 2 to 3 years now.
Ok, Ok, I'm giving them a plug, but hey their product works well.
reassign null to be the tape device - it's so much more economical on my time as I don't have to change tapes_BOFH
Also, I have to mention Celestia, a great Space Simulator, similar to OpenUniverse.
In closing, let me say that I think people should take more of an interest in astronomy, as the understanding and exploration of space is one of the most important goals humans should have if they wish to survive longer 500 million years or so.
"To confine our attention to terrestrial matters would be to limit the human spirit." -Stephen Hawking
What about a peer-to-peer network of amateur astronomers running highly-computerized telescopes and a special P2P program ? If the program is really good it will be able to discover automatically interesting things - like potential objects that might collide with the Earth !!! A project like this (even one slightly subsidized by public funds) can certainly be VERY cost-effective - and unlike much bigger projects can be started rather quick. And if you think that ever since the Apollo program the budget for space is smaller and smaller this might actually be the only effective way to avoid the same fate as the dinosaurs!
I would rather say people should look at environmental pollution and global warming if they wish to survive 500 years or so.
This is just me, but, wouldn't leaving the databases where they are clog network bandwidth, as opposed to say, having them on one local LAN?
Z39.50 is also a light weight protocol and studies shows that searching many databases in parallel is not a problem, it is usually the database servers that are the bottle neck.
I am involved somewhat in the development of the Virtual Observatory. There are some details that often get overlooked in articles about the VO. First off, its more than putting data on the web. That we do already (the Hubble Space Telescope archive is a 7+ terrabyte archive that is on the web). The real challenge is to make an infrastructure to allow these archives and terabyte databases to interact with grid computing services. We have been working on this for several months now and are working on some demos of the technology for the January American Astronomical Socieity meeting in Seatle.
An example of such a VO project is the Galaxy Morphology demo. We take catalogs of a cluster of galaxies from one source, identify those sources with emission form a separate catalog, fetch images of all of those galaxies, and send the images and brightness information to a grid computer service that calculates the morphology of the galaxies, sending this result to the user to visualize in a VO complient piece of software. The user did nothing but pick the cluster and then look at the results. Much more than simply putting data on the web. And once this service is developed, it can simply be put into a web page for others to use and learn from.
Most of this involves creating simple to use yet potentially powerful interfaces to services. While we are not using true RPCs like SOAP yet, the idea is to create standard interfaces to things like image servers, catalog servers, and the like. With those services, we will extend beyond to data and service discovery. Standard data and metadata formats are also being developed, as are common datamodels, all with the intent that these will make data and service exchange simpler. This all leads to service registries, where many applications will go to discover data and services that could be used for a particular project.
Jim Grey is involved with the project. He lead the Terraserver project at Microsoft Research. He found that, as he put it, images of the earth are worth money; those of the stars are not. Because of this, he found the research he was doing on distributed data with the terraserver project was running into snags where making money hindered access to the data. This not to be true for astronomical data. Hence he is now looking up rather than down now. There is in development a version of Terraserver for different parts of the VO in the works.
There will be usage points for people all the way from my mother who loves astronomical wallpaper to the hard core researcher and all points in between. Public outreach is being built in at the ground level, so this is not just for astronomers. Many of these will be web bases interfaces to the VO, but others may be simple toolkits to make your own services. Some could be simple to use to do basic science projects in school, some may be for science fair level projects, and some for people to develop educational web-based lesson plans.
Yes, 10 million dollars seems small. But its a start. And we are not the only ones working on VO technologies. The Europeans have thier own VO, as does Canada, Russia, India... The divisions are mostly political (each funding agency has its own VO title). The IVO has been establised to act as a stearing body to help us share efforts and make things interoperable from the start.
Today is a gift. Save the receipt.
The brain behind the Terraserver, is involved with a similar sounding project called the Sloan Digital Sky Survey.
While the main benefits of the virtual observatory will be to researchers, the $10 million is only the start, and more money will be needed, and the way to get more money is to make it popular with voters.
There are two examples of indexing large databases for the masses that come to mind. One is Google, and the other is Amazon.
Google ranks items by how popular they are, based in large part by how many links there are to the web page. Amazon gives you a list of books other customers bought when they bought the book you found in your search.
For astronomical data and images, something like those approaches could be quite entertaining. I could go to a popularity list to see which images and data everyone else was looking at (a million flies can't be wrong...). But then, like the Internet Movie Database, it would be fun to see other images and data that was most often found in the same papers or web pages as this item. Somewhat like the Science Citation Index (or the Kevin Bacon game).
Users could also rate the images and data. Then we could have lists such as "people who liked this nebula also liked these HST photos". Images could be grouped by popular use -- "Images most often used as wallpaper", "Images most often used by science magazines", "Data most often used by newspapers", etc.
Free book: Science Toys You Can Make
The P2P idea is interesting in that it could apply to individually collected small data sets. Here's how observational astronomy has traditionally worked:
Astronomer writes a proposal to do some research using a specific telescope(s)
Proposal gets accepted after peer review
Astronomer travels to observatory to spend many of his own nights collecting data
Astronomer takes the time to reduce and analyze his own data
Astronomer writes a paper(s) saying, "Hey - look what I did!"
(Sometimes) astronomer writes a proposal for further funding based on the merits of this work
This procedure is inefficient in that you sometimes get multiple people who are not working together, doing the same project on different telescopes. If I collect a bunch of data in one part of the sky, try to use it but don't actually get around to finishing and publishing a paper, and then archive it locally, nobody in the world knows that the data exists. So now if someone else wants to do the same project, they go to the telescope and recollect the same data. In other words, there's no central log of who's done what when it comes to individual observing.
P2P could be useful to remedy this. The problem is that astronomers tend to be very proprietary about their data. Sometimes research and publishing can be very competitive, and you don't want to give the competition an edge when it could mean that they publish a paper on a particular topic before you and reap the rewards, or get funding when you don't. So I think that most astronomers would share their data openly in a P2P network only after they were completely finished using it, and some would never do so.
The difference with the data sets being accessed by the proposed Virtual Observatory is that the people who create those sets typically get their funding with a stipulation that the data be publically accessible some time after the work is finished. They're not allowed to keep it proprietary even if they'd prefer to do so for competition reasons.
Isn't this what Internet II is supposed to be for?
This looks really interesting and I'm looking forward to playing around with it. I was wondering how it compares with other similar-sounding astronomical survey projects that combine existing data such as the Sloan Digital Sky Survey. Is it expected to replace the existing ones?
Did anyone see that episode fo DIlbert(it was a cartoon for two seasons on UPN) where their satelite crashed so they just started giving nasa pictures that dilbert made on his computer.Nasa guy: This starfield looks like it was made with a paint program on a PC. Dogbert: You have to admit that in the infinte universe it must look exactly like that somewhere from some angle. Nasa Guy:*looks puzzled for a moment* Our budget problems are solved! Can you give us evidence of life too? Dogbert: This pictures teeming with life, look right there, that stars blurrier than that one. *NASA Guys Rejoice*
"Sic Semper Tyrannosaurus Rex."
I think that a good web interface that anybody could use, would be,is a set of virtual reality glasses (voice controlled and response) that anybody could use simply by looking (in any direction, up, down, sideways) and seeing the universe as it is , then asking the glasses to zoom-in, zoom out, and get visual feedback (in the glasses display) what you are looking at. You wouldn't need a telescope to look thru, simply put on the glasses. The system would need a good interface, perhaps use a version of the CYC intelligent databas program (askjeeves search engine uses it, you can ask it questions in english..). A simple astronomical interface would introduce a lot of people to the astronomical universe around them..
Why is this database being built to be accessable through a web browser? Surely custom client software would be a vastly more efficient method of manipulating remote databases?
Just because the web exists doesn't mean that it should be used for everything, even if it can, especially since this project isn't going to be accessable to the general public. A small custom cross-platform client application would make much more sense depending on the data being accessed - it would probably allow for more efficient automation of searching and repetitive tasks as well by not having a completely dumb client.
I hope they considered what tasks the end-users will actually be doing with the data and are going to allow them the flexibility to be creative in their manipulation and searches.
Good to know. As someone who is asked to moderate on Slashdot nearly every week, I certainly appreciate some examples to measure against.