Google's Academic TB Swap Project
eldavojohn writes "Google is transferring data the old fashioned way — by mailing hard drive arrays around to collect information and then sending copies to other institutions. All in the name of science & education. From the article, 'The program is currently informal and not open to the general public. Google either approaches bodies that it knows has large data sets or is contacted by scientists themselves. One of the largest data sets copied and distributed was data from the Hubble telescope — 120 terabytes of data. One terabyte is equivalent to 1,000 gigabytes. Mr. DiBona said he hoped that Google could one day make the data available to the public.'"
One terabyte is equivalent to 1,000 gigabytes.
Uhh, no it isn't. It's really 0.9765625 terabytes.
This is absolutely the most cost effective way of transferring large amounts of data like this. If you do the calculations on terrabyte size files, sneakernet (of FedEx net) is actually faster and less expensive. We also went to one of Jim Grey's seminars when he was here giving an Organick Memorial Lecture and he made an incredibly compelling demonstration using a variety of data types. We ended up talking with him for some time after about new projects we are engaging in that will also be generating terrabytes of data and his suggestion was to pass applications rather than data which was interesting.
This is becoming more and more the norm in scientific research and Google's work is quite welcome.
Visit Jonesblog and say hello.
Here's what happened when I FedExed my RMA to Newegg, packed very carefully. Note the bent motherboard - I didn't even know you could do that. The good news is that FedEx paid part of my claim ... they paid $100 plus the $8.33 that the FedEx store charged me to fax in the claim forms. The bad news is that they did not refund my original shipping or pay more than $100 on the over $280 of damage that they did. It also took about 4 hours of phone calls to even convince FedEx that I was not the seller, and then they lost my claim in their e-mail system (and did not reply to my e-mails) and closed it out for inactivity after a month or so, until I called them and asked what happened.
On a side note, don't bother with UPS insurance. I insured something when I sent it to myself once, and they broke it and the insurance remedy was to return it to the origination address and ask to see an original purchase receipt to award the insurance claim. If you happened to make something yourself or even received something as a gift, don't insure it when you ship it. And hire a private courier (unless someone has found a common carrier that doesn't suck).
Don't get me wrong -- many of the scientists want people to use their data (eg, see The Astronomer's Data Manifesto), but they also want to know who's using it, because it's how they justify the value of their projects, and the costs incurred from distributing the data (especially for non-active projects).
The science community is also working on the Science Commons (an equivalent of the Creative Commons for marking scientific data) and various federated search engines (eg, night time (astronomy) virtual observatories, as well as other space and earth science discipline specific VOs.).
Build it, and they will come^Hplain.
How you measure a terabyte depends on whether you are buying disk, or monitoring disk usage on your server.
The disk manufacturers define it as 1000 megabytes which is 1000 kilobytes which is 1000 bytes.
The OS measures it as 1024 megabytes, which is 1024 kilobytes, which is 1024 bytes
Why? Because when you're buying a drive, 750 Gigs sounds bigger than 698.5 gigs.
Well, the IEC and IEEE as well as the CIPM and NIST all agree thatthere are 1000 bytes to a Kilobyte and 1024 bytes tothe kibibyte. So there:P