Google To Offer Free Database Storage for Scientists

← Back to Stories (view on slashdot.org)

Google To Offer Free Database Storage for Scientists

Posted by ryuzaki0 on Saturday January 19, 2008 @10:30AM from the are-they-supporting-science-or-science! dept.

An anonymous reader writes "Google has revealed a new project aimed at the scientific community. Called Palimpsest, the site research.google.com will play host to 'terabytes of open-source scientific datasets'. It was originally previewed for scientists last August . 'Building on the company's acquisition of the data visualization technology, Trendalyzer, from the oft-lauded, TED presenting Gapminder team, Google will also be offering algorithms for the examination and probing of the information. The new site will have YouTube-style annotating and commenting features.'"

24 of 107 comments (clear)

mining for ads by spud603 · 2008-01-19 10:36 · Score: 5, Funny

So will they be mining the data for contextual ads?
I'd be curious what their algorithms think my data says I want to buy...
1. Re:mining for ads by Seto89 · 2008-01-19 11:03 · Score: 3, Interesting
  
  It managed to pick ads accurately even when I view a GPG encrypted emails through the web-interface - it gave links to proprietary PGP, some Fedora related sites and a page about encryption - all that from a standard header and encrypted text...
  
  --
  There are two kinds of people - those who are radioactive and those who have already decayed..
2. Re:mining for ads by mikael · 2008-01-19 12:58 · Score: 4, Informative
  
  These are data sets that have already been placed in the public domain by the scientists. These could be astronomy images, multi-spectral image photography, remote satellite imagery, seismology recordings, MRI/NMR/CAT scans and many other types of volume, image and signal data.
  
  --
  Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
OMG WTF THIS SUX by User+956 · 2008-01-19 10:40 · Score: 5, Funny

The new site will have YouTube-style annotating and commenting features.

And hopefully the commentary will be just as insightful and poignant!

--
The theory of relativity doesn't work right in Arkansas.
oblig by qw0ntum · 2008-01-19 10:44 · Score: 4, Funny

So we're going to have YouTube-like commenting?

Is this the future of scientific discourse?

--
'Every story, if continued long enough, ends in death.' --Ernest Hemingway
Are they insane? by Hognoxious · 2008-01-19 10:49 · Score: 5, Funny

Why would you want to store a scientist in a database?

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
1. Re:Are they insane? by jd · 2008-01-19 12:13 · Score: 3, Funny
  
  Because you can then replicate the really good ones. I would have thought that obvious.
  
  --
  It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
2. Re:Are they insane? by Malevolent+Tester · 2008-01-19 12:49 · Score: 3, Funny
  
  Might be a way to get them to join a union.
  
  --
  If you haven't made a developer cry, you've wasted a day.
3. Re:Are they insane? by jma05 · 2008-01-19 13:04 · Score: 5, Funny
  
  > Why would you want to store a scientist in a database?
  
  So that these geeks can have normal relationships.
Fantastic for Students and New Researchers by cheesethegreat · 2008-01-19 10:56 · Score: 5, Informative

If this actually happens, and researchers are willing to make their data-sets open source, it would be a huge boon for budding researchers. It would allow students to do more than just work with a sample dataset out of a textbook. Graduate students learning how to do advanced modeling would be able to work with real datasets, vastly improving their skillset and employability. Just consider these two lines on a CV, and ask yourself which one jumps out at you.

"Designed a model for the dataset on the CD-ROM included with the Modeling Organic Systems textbook"

"Designed a model for the WISK-III heart output dataset published in 2006."

New entrants to a field would have instant access to enormous amounts of data very quickly and easily. Although the big kudos comes when you can do totally original work (new data, new analyss), a researcher who could come up with a new critique of older papers and studies would definitely get themselves noticed.

Overall, this is a really positive step for everyone on the lower rungs of the scientific ladder, and especially positive for those with limited resources.
1. Re:Fantastic for Students and New Researchers by ushering05401 · 2008-01-19 11:04 · Score: 4, Insightful
  
  I feel your optimism, and support this idea, but the cynical side of me must speak out.
  
  Isn't this information more likely to be capitalized upon by those who already dominate the commercialization of research?
  
  Yes, noobs would have enormous amounts of raw material at their disposal, but wouldn't they find applications derived from this data already covered by patents that were distilled from the data sets through analysis performed by labs full of trained corporate monkeys before they can get their own foot in the door of innovation?
  
  I would love to awaken one day and find that I am just being a jaded fool, but I believe developments like this will help the commercialized overlords more than anyone else as they are the ones with sufficient resources to throw at privatizing the results of scientific research.
2. Re:Fantastic for Students and New Researchers by xenocide2 · 2008-01-19 11:21 · Score: 3, Insightful
  
  Isn't this information more likely to be capitalized upon by those who already dominate the commercialization of research? Can't it be both? It's not like by subscribing you're depriving others. And the data uploaded will be made freely available.
  
  You cannot patent mere data, or interpretations of data. Patents are for machines, processes, and the like. Of course, the publication of data doesn't preclude people from patenting a chemical process that results in a specific gene, but this is already happening elsewhere.
  
  In fact, I suspect the entire point of this is for Google to take over maintenance of the Genomic Databases and create new such databases. Many times the academic databases are.. poorly maintained, and certainly not compatible, despite the very similar contents. There's already efforts to make them more compatible, but Google appears to be able to offer some very neat stuff on top of it all. The silliness about shipping RAID arrays mostly seems to be for unis not already hooked up to I2.
  
  --
  I Browse at +4 Flamebait
  Open Source Sysadmin
3. Re:Fantastic for Students and New Researchers by cortex · 2008-01-19 11:22 · Score: 5, Insightful
  
  As a neural engineering researcher who routinely generates terabyte size datasets, I have to say that I both like this idea and think it is unlikely to succeed. I would love to have a place to store large datasets and access them from wherever I am at. However, since these datasets will be open sourced, I will be extremely unlikely to put any dataset on google until I am certain I have extracted all of the publishable findings from it. I think that most researchers after putting in years of effort and a lot money into acquiring a dataset will also think twice about open sourcing their data. If the TOS where to include some means for controlling publications which resulted from analysis of the data, then it might be more likely to succeed.
4. Re:Fantastic for Students and New Researchers by Gromius · 2008-01-19 11:59 · Score: 5, Insightful
  
  As a researcher myself (particle physics), I echo others comments in this thread that a) its a nice idea but b) isnt going to happen. There are three main problems, the first two are solvable, the third isnt
  
  1) trivially, 3TB is no where near enough to store my data
  
  Bit of a non issue for the overall concept but if google wants my data, they really are going to have to up the storage by a few orders of magnitude.
  
  2) as others stated, we work really really hard to acquire our data, research is about 10% inspiration, 90% perspiration. We are not giving up our data till we have milked it for all its worth.
  
  This again is solvable, we release our data after we have all the publishable results we can think of and them let others have a crack. Somebody might find something useful and if not, well its great for younger scientists as you say. At the very least, people can reconfirm results at a later date easier. Main reason I like it.
  
  3) The deal killer, for my field and I suspect others, it is really really difficult to understand our data and its really easy to misinterpret it.
  
  New particles have been "discovered" so many times by grad students (and some professors who should know better) in particle physics data that I'm terrified of what somebody with no training outside the system might conclude from the data. At CDF (a fermilab expt) it took us (800 physicists) about 2-3 years to understand the data from the experiment enough to get proper physics results out of it. Even now, it takes a new comer about a year to get upto speed and thats with help from all the experts. But its very easy to think you understand things after a few weeks when infact your missing some incredibly subtle point and so I'm sure we would be flooded by bogus results due to misinterpretations from the data if we release it.
  
  Anyway this all comes from a particle physics view point but I suspect quite a few other fields will be similar.
5. Re:Fantastic for Students and New Researchers by JanneM · 2008-01-19 12:19 · Score: 3, Insightful
  
  If the TOS where to include some means for controlling publications which resulted from analysis of the data, then it might be more likely to succeed.
  
  But in that case, would you want to go anywhere close to someone else's data, for the risk of "contaminating" your research and perhaps end up in a protracted brawl over discovery rights?
  
  I mostly agree with everybody else: it's a neat idea but for a lot of people it's not going to fly.
  
  The one area I think it could be good is for datasets that are already open and that are meant to be shared. In vision research, for instance, or in various fields in machine learning there's quite a lot of sort-of-standard test data sets created by various groups that can make it easier to compare models directly. Having all of those collected in one place would certainly make it easier to find and actually use them rather than reinventing the wheel once again.
  
  --
  Trust the Computer. The Computer is your friend.
6. Re:Fantastic for Students and New Researchers by CastrTroy · 2008-01-19 13:40 · Score: 3, Insightful
  
  That's really weird that this appeared on Slashdot tonite, just as I was downloading the historical weather data for Canada. Still waiting for it to download. I was thinking that it would be a nice data set that would be interesting to work with. It's not a huge dataset by any means, only 200 MB zipped, but it's still bigger and more real than any of the stuff I got to use in university. And a lot larger than any real data set I could generate on my own. Does anybody else have any links to interesting open data sets?
  
  --
  
  Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
7. Re:Fantastic for Students and New Researchers by cortex · 2008-01-19 18:48 · Score: 3, Interesting
  
  20th century or not, the fact is that if I don't publish papers with my name as first or last author I don't get tenure. I'd be happy to have people publish papers using my data as long as I have already gotten a few first author papers out of it. Of couse, that would only apply to my data that is several years old. Also, what is to stop someome from publishing using my data and not having me as an author at all? The TOS to access the data are going to be very important.
8. Re:Fantastic for Students and New Researchers by Gromius · 2008-01-19 22:46 · Score: 4, Informative
  
  Yes I can see how it can appear elitist. And yes it is elitist in a sense. Because its really hard. A PhD student typically has to do about 3 years hard work to get out an analysis sufficient quality and thats with help from experts. Before that they have 4 years of advanced physics. I'm not saying the common man cant do it, just it'll take them years of hard work to understand it to analysis physics results which have already had 800 physicists pour over it to extract most things of value. However as I said, its really easy to think you've understood the data in a week or so and produce bogus results which I suspect most people would do.
  
  As for the few geniuses who can handle the data better than any of us, yes its a noble idea and it sounds nice in practice. However these geniuses are still going to have to slog through the data and its still going to be hard, even for them to do it by them selfs. Its not something some wiz kid will pick up and by the afternoon have a nobel prize. However if they are really interested, they can stop by their local particle physics lab and talk to the people there. Its not as if we dont ever give out our data, lots of students (undergraduates and 6th formers (high schoolers for yanks) over the years have been given a copy and helped to understand it. If you want it badly enough you'll probably get some sort of access to old data. Sure some may fall through the cracks but thats unavoidable.
  
  Also incidentally the most bogus results I'm afraid of are not from the general public but from our theoretical colleagues who are actually the people we are most concerned about hiding the data from :) A lot (but not all) think that data analysis is easy and have a vested interest in proving a certain model so subconsciously they might misinterpret the data or not rigorously check it when it looks like its proving what they want it to prove. Then all of a sudden you have headlines like Prof. X from Ivy League University Y has found a new physics Z in Tevatron/LHC data which if true would be the most significant discovery in physics in the last 30 years and so is splashed all over the media. The public and media just knows this guy is an ivy league professor but doesnt know that he is little more qualified to analysis the results than they are so they believe him. Arguments would then ensure of the significance of the finding and then eventually a retraction is printed. But this would be in the public and in the media and I think this is damaging to science as the general public starts thinking "these stupid scientists, always changing their mind, should we believe anything they say". Plus you would get an increase in the usual crazy science results but this time with data whose analysis most people cant tell is rubbish. Slashdot would be happy as they tend to like crappy science :) but its not something scientists would be happy with.
It'll All End In Tears by turgid · 2008-01-19 11:10 · Score: 4, Insightful

This is a Bad Idea. Too much of the world now depends on Google. And people are running to Google, willing to give their data and identity.
/me shakes walking stick and creeps back into cave.

--
Stick Men
1. Re:It'll All End In Tears by Anonymous Coward · 2008-01-19 11:47 · Score: 4, Funny
  
  Do you have any datasets to back up this claim?
Horrible Idea - What are the TOS? by teknopurge · 2008-01-19 11:11 · Score: 4, Insightful

Does google get ownership of anything that is uploaded? I wonder how foolish scientists will be as to unknowingly forfeit their copyrights, IP, etc.

--
Website Hosting
1. Re:Horrible Idea - What are the TOS? by hostguy2004 · 2008-01-19 11:56 · Score: 5, Informative
  
  Google are offering this service to store PUBLIC DOMAIN data. If people don't want to release the data as public domain, then this aint the service for them. See http://en.wikipedia.org/wiki/Public_Domain
  
  --
  In Soviet Russia ^H^H^H America, The bank finances YOU!
Re:And in Redmond.... by SnowZero · 2008-01-19 11:11 · Score: 3, Funny

IN SOVIET RUSSIA, quote messages YOU! You fail at quoting.
Re:And in Redmond.... by tomhudson · 2008-01-19 13:43 · Score: 3, Informative

3 terrabytes isn't that much any more. You can get 750 GByte hard drives for $160 - 5 drives ($800) gives you your 3TB.
Or 4 x 1TB hard drives ($180 ea) gives you $720, so throw in $10 to boot the os off a usb key.
Cheap linux box? Well, you don't need to supply a monitor, keyboard, mouse, speakers, or even much ram - you do the math.