Google To Offer Free Database Storage for Scientists
An anonymous reader writes "Google has revealed a new project aimed at the scientific community. Called Palimpsest, the site research.google.com will play host to 'terabytes of open-source scientific datasets'. It was originally previewed for scientists last August . 'Building on the company's acquisition of the data visualization technology, Trendalyzer, from the oft-lauded, TED presenting Gapminder team, Google will also be offering algorithms for the examination and probing of the information. The new site will have YouTube-style annotating and commenting features.'"
So will they be mining the data for contextual ads?
I'd be curious what their algorithms think my data says I want to buy...
Google doing this. And they use Linux "suitcases" for transport.
Hide the chairs.
The new site will have YouTube-style annotating and commenting features.
And hopefully the commentary will be just as insightful and poignant!
The theory of relativity doesn't work right in Arkansas.
So we're going to have YouTube-like commenting?
Is this the future of scientific discourse?
'Every story, if continued long enough, ends in death.' --Ernest Hemingway
This should come in handy for my research on normal variants of the female mammary glands.
Why would you want to store a scientist in a database?
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
... so that explains why the RDBMS dudes were bitching about mapreduce t'other day:
http://it.slashdot.org/article.pl?sid=08/01/18/1813248
If this actually happens, and researchers are willing to make their data-sets open source, it would be a huge boon for budding researchers. It would allow students to do more than just work with a sample dataset out of a textbook. Graduate students learning how to do advanced modeling would be able to work with real datasets, vastly improving their skillset and employability. Just consider these two lines on a CV, and ask yourself which one jumps out at you.
"Designed a model for the dataset on the CD-ROM included with the Modeling Organic Systems textbook"
"Designed a model for the WISK-III heart output dataset published in 2006."
New entrants to a field would have instant access to enormous amounts of data very quickly and easily. Although the big kudos comes when you can do totally original work (new data, new analyss), a researcher who could come up with a new critique of older papers and studies would definitely get themselves noticed.
Overall, this is a really positive step for everyone on the lower rungs of the scientific ladder, and especially positive for those with limited resources.
If a computer will ever be able to invent something, or make a scientific discovery, it'll certainly be (IMHO) a computer directly related to Google.
so now anything that is publicly available fits under the heading open source? you guys are really trying to give yourselves too much credit.
source is code. if what you're offering isn't code than it can't be open source. public domain should be referenced as such as to avoid further degradation of terminology.
and watch in ecstasy as one of Google's suitcase drives SLURPS up the FBI's *real* datasets on 9/11, Elvis... oh, and that schematic for a site-to-site transporter beam that I knocked up a while back, which they somehow stole off my google docs.
"He Who Dares Wins"
Researchers I know would fill up a yottabyte if they were allowed to. I hope Google has plans for keeping growth of the datasets under control.
This is a Bad Idea. Too much of the world now depends on Google. And people are running to Google, willing to give their data and identity.
/me shakes walking stick and creeps back into cave.
Stick Men
Does google get ownership of anything that is uploaded? I wonder how foolish scientists will be as to unknowingly forfeit their copyrights, IP, etc.
Website Hosting
Get lost, Valerie. Men are talking.
Ohh.. terabytes of storage.... wow. I have terabytes of warez in my home, which makes this PR blitz look really petty. What, they're not gonna announce 10MiB of homepage and 20MiB email storage too?
I'm looking forward to "OMG, ur resrch is teh sux" comments and "CHEEP FUNDING M0RTG4GE" spam from elite universities around the world.
:/- spoon(_).
> I wonder how foolish scientists will be as to unknowingly forfeit their copyrights, IP, etc.
Scientists generally forfeit IP and copyright to their host University anyway, although the mileage of that varies between institutions.
The other day my wife said she wants there to be Google Bank. They'd certainly get the online banking thing done right...
"Destroy science and religion. Science would re-emerge exactly the same; but not religion." - Penn Jillette, paraphrased
The Storage@Home thing that was mentioned, albeit possibly in the comments, a while back. I'm not sure, at all, whether or not the Folding@Home data is meant to be public domain but, were it so, then it'd be a preferable solution in part to using a p2p style storage alternative.
Of course the three terabyte limit might cause problems there.
www.eBay.com: Buy new and used Plutonium for your research, now at eBay!
Their insanity is proven by the following statement: The new site will have YouTube-style annotating and commenting features. Anyone who would use *youtube comments* as an inspiration for their site is obviously in need of mental help.
Palimpsest? Are they planning to routinely overwrite your data?
If you disagree with me on social issues, then it's pretty clear that you are a narrow-minded bigot.
:-) .....
I'd say the most useful part would be to find correlative information from disparate fields. The nice thing about a single repository with a single interface is that you can find ALL the data you may need to investigate an interesting hypothesis. Like my current senior thesis on Economic activity and it's correlation with water usage. It's attempting to bring two spatial data sets into a single framework. All the information is out there, but it's rare to find any published papers about it, let alone any standardize set of data to go off of. So right now i'm sitting with a bare minimum of information of economic indicators, because all the other data out there doesn't seem to be easy to find, access or get to the bottom of. I'm sick of finding PDF's, loaded with information, but no real way to get at it without alot of heavy lifting. This is I imagine what google's trying to fix. Taking already available data, and placing it in an easy to use and format it in the way you need it for your GIS/EXCEL/DATABASE/SCIENCEGRAPHER. Though, one should always note the correlative between knowledge and power, and absolute power and corruption.
Why, I made three terabytes in just 15 hours of solar observing last summer.
The Solar Dynamics Observatory, due to launch into geosynchronous orbit next summer, is a three petabyte mission.
This sounds like Google is creating a ManyEyes site for the scientist set. http://services.alphaworks.ibm.com/manyeyes/app it's a lot of fun, but I don't see the Google version making neat things like word trees of the Grimm Fairy Tales like I did here: http://services.alphaworks.ibm.com/manyeyes/view/SmAgULsOtha65G-s4kxXL2-
No idea where you got that idea. As they wrote, the database is free for scientists.
I have been in a couple of large scientific projects, and the main problem with making the data public has been to ensure that the researchers who collect the data are getting "author credit" in scientific publications.
The scientists who collect the data are often other people than those who analyze the data, and fit them to the models. As long as everybody is working on the same project, it is possible to ensure that the people who collect the data will be listed as authors in the papers, even if they are usually written by the people who analyze the data.
Once the data has been published, all bets are off. People will analyze the data, and write articles about it, with themselves as authors and a proper acknowledgment to the project that collected the data.
As science works today, being listed as author is paramount. It is the only criteria used by the bean counters to judge whether a scientist is doing any active research. With zero publication follows zero grants, and soon after, zero paycheck.
The way we have done it has been to delay publication of the raw data until the first batch of scientific papers has been accepted. After that, everyone have access.
I couldn't find anything about it from an authoritative source, like Google. Anyone has a better link?
do you think social sciences could benefit from this as well? -that is, if they can get over they fears of opening their data to others- And if yes, how?
This is great news to hear. Tranche is a grass roots effort to do the same sort of thing, http://tranche.proteomecommons.org./ Tranche is used mostly in proteomics and it works more like BitTorrent versus shipping around data. Hopefully Google will work with it.
Maybe for the first time we'll have gigabytes of rainbow tables for free...