Google's Academic TB Swap Project
eldavojohn writes "Google is transferring data the old fashioned way — by mailing hard drive arrays around to collect information and then sending copies to other institutions. All in the name of science & education. From the article, 'The program is currently informal and not open to the general public. Google either approaches bodies that it knows has large data sets or is contacted by scientists themselves. One of the largest data sets copied and distributed was data from the Hubble telescope — 120 terabytes of data. One terabyte is equivalent to 1,000 gigabytes. Mr. DiBona said he hoped that Google could one day make the data available to the public.'"
One terabyte is equivalent to 1,000 gigabytes.
Uhh, no it isn't. It's really 0.9765625 terabytes.
This is absolutely the most cost effective way of transferring large amounts of data like this. If you do the calculations on terrabyte size files, sneakernet (of FedEx net) is actually faster and less expensive. We also went to one of Jim Grey's seminars when he was here giving an Organick Memorial Lecture and he made an incredibly compelling demonstration using a variety of data types. We ended up talking with him for some time after about new projects we are engaging in that will also be generating terrabytes of data and his suggestion was to pass applications rather than data which was interesting.
This is becoming more and more the norm in scientific research and Google's work is quite welcome.
Visit Jonesblog and say hello.
But are they using station wagons?
Never underestimate the bandwidth of a station wagon...
Still very much applies today.
Ryan Fenton
This sounds almost like stories of scholars trading/copying books from long long ago. It's actually a somewhat interesting plan.
"In case of emergency, break glass. Scream. Bleed to death."
How long do you think it will be until some maroon somewhere plunks a hard drive into an unpadded envelope and drops it in the big blue mailbox on the corner?
This guy's the limit!
Whos going to own the data? I hope Google isnt going to say they do like they want to with the old books theyre scanning. Everytime you download a hubble picture will it have a google watermark?
Libertarian Leaning Political Discussion Forum.
It was said some time ago that the fastest way to transfer data was in a station wagon full of backup tapes traveling down the Interstate. I guess we now update that now to a mini-van full of hard drives...
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
The bandwidth of a moving van full of disks.
Looks like Google is hoarding data. Seems they at least are equating information with power and money. And them that has the power and money makes the rules.
Here will be an old abusing of God's patience and the king's English.
I mail my external hard drives to different friends a few times a year. I have several, but one specifically for mailing to friends and co-workers. I thought this was somewhat of a common practice.. I have never had a fellow geek gawk at the idea, rather it seemed like the only logically way to get what we wanted to do done.
:)
Google is doing something cool by getting and hopefully displaying the data, but the method is not really anything newsworthy is it? I mean, this is the same as using a flash drive to transfer files real quick, this is just on a much larger scale
Invexi - a Phoenix, AZ based web design and web development company.
FedEx delivered what appeared to be a ton of broken office chairs to Google headquarters this morning. When asked for the sender's ID, the severely beaten FedEx courier would only reply that the sender wished to remain anonymous.
Well, there's spam egg sausage and spam, that's not got much spam in it.
Moe: Say, Barn, uh, remember when I said I'd have to send away to NASA to calculate your bar tab?
Barney: Oh ho, oh yeah, you had a good laugh, Moe.
Moe: The results came back today. (reading a printout) You owe me seventy billion dollars.
Barney: Huh?
Moe: No, wait, wait, wait, that's for the Voyager spacecraft. Your tab is fourteen billion dollars.
Technically that would be a tebibyte. Tera does indeed mean 10^12.
I'm glad they explained what terabyte means though, I doubt the slashdot crows would be familiar with such a term!
Why?
Why is a Kilobyte 1024 bytes, if "Kilo" means 1000, both according to the SI and the greeks (Kilo is derived from khilioi). If 1 kg = 1000g, 1 kV = 1000V, 1 km = 1000m, why should hard disks break the pattern?
When we're talking about addressable computer memory, approximating the kilobyte to 1024 is a convenience, but since Terabyte gives such a huge error, and makes absolutely no sense for data transfer or disk sizes, it's really time we stopped this illogical naming convention just because some engineers found a term convenient 40 years ago.
120 TB of data from the Hubble telescope? I wish I was paid to go through that. And this picture is of a...star and this one is a star And a star another star OMG its a FRICKIN STAR
"Luck is a tag given by the mediocre to account for the accomplishments of genius." -Heinlein
Wrong. One tebibyte is equal to 1024 gibibytes. One tarabyte equals 1000 gigabytes. If you're going to correct someone, do it right.
SUVs to transport those hard drives. That would be evil.
The more you regulate a company, the worse its products become.
coding is life
Don't say I didn't warn you guys about this "don't be evil thing." First they start swapping TB for "academic" purposes, then maybe some avian influenza in some apartments around Mountain View, and next thing you know, they'll be a smallpox outbreak and we will coincidentally receive advertisements on gmail that we can buy the cure for a few thousand dollars from one of their Adsense "partners."
The only thing you're getting by saying that is a flamewar between 10 kinds of people, whose who count only in MB (and disagree with you) an those who count in both MB and MiB (and agree with you) !
For my take on the issue, see this precedent post of mine.
I have discovered a truly marvelous proof of killer sig, which this margin is too narrow to contain.
I really don't want to share, whether it's academic or not!
Wrong. One tebibyte is equal to 1024 gibibytes. One tarabyte equals 1000 gigabytes. If you're going to correct someone, do it right.
You meant 'terabyte', not 'tarabyte'. If you're going to correct someone, do it right.
"It does not do to leave a live dragon out of your calculations, if you live near him." - Tolkien
"The moral of the story is: Never underestimate the bandwith of a station wagon full of tapes hurtling down the highway."
-Andrew Tannenbaum
Test your net with Netalyzr
I call B.S. "Lack of engineering time" is why we haven't seen the source to the core search engines or gmail?
If you're going to reply to parent, at least reply to the right one.
The Political Programmer
Use the kibibyte if you have a big problem with it.
But I have long since buried my problem with using the SI prefix with byte to mean a power of 2, actually not sure i ever had one, I just accepted it. I am happy with the 1024b=1Kb, 1024Kb=1Gb and 1024Gb=1Tb. The usable space is lower in the case of non-volatile storage anyway, 1Tb never means 1024Gb might be closer to 1000Gb (i don't know).
I've been thinking that the only home use app lots of HD storage space would be A/V. Now, I guess when 10 PB of HD are $100-1120, then we'll be able to get copies of these 120 TB of hubble data or TBs of other datasets to fill up those future home PB HDs. One day we'll need home exabyte HD to store and play around with public PB datasets.
I can only hope that bandwidth can keep up. How long would it take to transfer a 120 TB bit torrent file over either cable or dsl?
Well, maybe we'll have small TB USB flashdrives that we can just mail those around instead of upgrading our bandwidth.
...that a researcher sends them all the printouts of his/her data... on greenbar...
GetOuttaMySpace - The Anti-Social Network
I'm so tired of this stuff. Byte me!
Faster! Faster! Faster would be better!
No. I'm quite happy to accept that a Terabyte is anywhere between 1,000,000,000,000 and 1,100,000,000,000 bytes for general use, simply because it doesn't matter. It gives an idea of the amount of storage, which is all we need. If I was specifying I'd use neither and just say 3.7*10^12 bytes or whatever.
I just get a bit fed up when people insist that the illogical, and deprecated usage of terminology is correct and a usage that has been accepted for quite some time (and long before marketting got involved) is incorrect.
I doubt the slashdot crows would be familiar with such a term!
Most of them have TBs of pr0n running on a RAID in their mom's basement.
Here's what happened when I FedExed my RMA to Newegg, packed very carefully. Note the bent motherboard - I didn't even know you could do that. The good news is that FedEx paid part of my claim ... they paid $100 plus the $8.33 that the FedEx store charged me to fax in the claim forms. The bad news is that they did not refund my original shipping or pay more than $100 on the over $280 of damage that they did. It also took about 4 hours of phone calls to even convince FedEx that I was not the seller, and then they lost my claim in their e-mail system (and did not reply to my e-mails) and closed it out for inactivity after a month or so, until I called them and asked what happened.
On a side note, don't bother with UPS insurance. I insured something when I sent it to myself once, and they broke it and the insurance remedy was to return it to the origination address and ask to see an original purchase receipt to award the insurance claim. If you happened to make something yourself or even received something as a gift, don't insure it when you ship it. And hire a private courier (unless someone has found a common carrier that doesn't suck).
Because only real nerds have a problem with 1KB being 1024 bytes rather than 1000 bytes, and kibibytes or whatever you want to call them is a really stupid name. Who wants to have to deal with buying 1.073741 gigabyte DIMMs for their PC when we can just agree instead that a gigabyte is a power of two, not a power of ten?
As for why it's different for disks to RAM, disk manufacturers discovered a long time ago that they could make more money by using SI rather than binary measures for disk size, because it artificially inflated their size. Hence people now complain that they buy a 'one terabyte' drive and it actually only holds 900 gigabytes and change.
Simple we don't we just work in a different base:
2^10 = 1024 bytes
See:
http://en.wikipedia.org/wiki/Kilobyte
It's not illogical it makes perfect sense to anyone who programs, well anyone who dose lower level programming. If computers were to work in base 10... Sorry I can not even go there.
True false
v.s. the classic
True Maybe False
v.s. The new base 10 computing
True
Could be factual
Might be accurate
Maybe right
Slightly correct
Slightly fake
Maybe phony
Might be counterfeit
Could be wrong
False
(Ok maybe not this bad)
That's the only instance of anyone claiming it's a jocular misspelling of 'moron.' other sites point out why it shouldn't be used as a derogatory name. I suggest gEvil beta refrain from using that word in a negative light considering what that word (when used as a noun) has meant for a long time for many people.
That excuse is about as weak as George Allen's.
I really don't like the idea of a "private" (yes i know its publically traded) company having control of this public information. The data was paid for by tax payers. Google will inevitably make money from this otherwise they wouldn't be doing it.
This is not right.
...what does this new P2P technology mean for me? I guess the RIAA is really in for it now.
It's not illogical it makes perfect sense to anyone who programs, well anyone who dose lower level programming. If computers were to work in base 10... Sorry I can not even go there.
If we want to worry about that then use KiB and MiB. But that doesn't make a huge amount of sense. 1KiB = 400h bytes. 1MiB = 100000h bytes. Powers of 256 would make a lot more sense.
um...he did.
---FourChannel---
The reason that hasn't been released would be "trade secrets."
Relax. Think before you call B.S.
I'm not criticizing or anything; just curious is all.
Quo usque tandem abutere, Nimbus, patientia nostra?
We have been sending two DVDs, with about 6-8 GB data, around every month for updates. Now we are trying rsync, which in our view has been more convenient.
Talk about physical transfer of terabytes of DNA info!!
How do you get the sperm separated from the semenal fluid before it hits the keyboard?
[mod -10 troll ;-)>]
I'm just happy they're not swapping tuberculosis.
Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?
Real geeks have no problem with overloading.
A tarabyte is a half a small cucumber sandwich with the crusts removed, served at tea-time on the Plantation.
that's what I read it as!
She's an astronomer, said the Sloan Digital Sky Survey produces about a terabyte of data a year. Not as much as the Hubble, but still pretty cool.
When you sympathize with stupidity, you start thinking like an idiot.
1.3Tb each or so. About $150,000. the drive is about $5500. $155,000 in total. A 750Gb hard disk costs about $1000. so it'd cost about $160k to do the same with hard disks.
Deleted
Why not? Today Google Earth, tomorrow Google Universe!
How you measure a terabyte depends on whether you are buying disk, or monitoring disk usage on your server.
The disk manufacturers define it as 1000 megabytes which is 1000 kilobytes which is 1000 bytes.
The OS measures it as 1024 megabytes, which is 1024 kilobytes, which is 1024 bytes
Why? Because when you're buying a drive, 750 Gigs sounds bigger than 698.5 gigs.
Well, the IEC and IEEE as well as the CIPM and NIST all agree thatthere are 1000 bytes to a Kilobyte and 1024 bytes tothe kibibyte. So there:P
We ended up buying a bunch of these to ship the arrays around in. Cardboard == bad :-)
Co-Editor, Open Sources
Open Source Program Manager, Google, Inc.
Tarabyte is part of the evening meal with father and mother at the palace in Helium.
I'm with you, although I have seen FedEx and UPS both damage a lot of packages. I think that their automated systems are a lot rougher on packages than AirBorne Express / DHL or the USPS's Parcel Post. But if you don't insure it, you're accepting that risk when you give them the goods.
A while back I bought a radio-controlled airplane, pre-assembled. It came in a big box, most of which contained the wing. So it was fairly fragile, but well packed, in tri-wall. Got it sent UPS, with insurance for the full value.
They ran it over with a forklift.
To their credit, they called me right away and basically said "uh, so we may have damaged your package a little bit, you might want to look it over." So I went and took a look at it, and it was mangled pretty much beyond recognition. I was a little surprised they had actually bothered to deliver it. But I called them up, told them the stuff inside was ruined, and they sent me a check. (I think that if they hadn't been aware that it was broken already, they might have come and picked it back up, but as it was, they didn't.)
The only problem I have with the way they do insurance, is that they always want the SHIPPER of the goods to file the insurance claim, rather than the receiver. So if you ship something to me, and it arrives to me basically destroyed, and I call UPS, they're going to say "hey, we can't do anything except ship it back, and that guy has to file the claim." It takes a lot of arguing and escalation in order to explain to them, that sometimes things just don't work that way.
I think this is because they're used to working with big businesses and retailers that want to get damaged goods back, and then send out new ones, but for eBay and private shipments, where the RECEIVER is absorbing the transit risk, and the shipper is just basically saying "hey, I'm selling this to you FOB, whatever arrives at your door is your problem" (which is the eBay standard), it creates a big problem. The last thing the shipper wants is for the damaged goods to come back at him, because from his perspective, he washed his hands of the whole business when he dropped it off at UPS.
So overall, I'm not hugely dissatisfied with them, they just need to get through their heads that it's not always the shipper who's going to initiate a claim, and that in many cases, it's going to be the receiver of a shipment who is purchasing the insurance and who is the one at risk if something gets damaged, and it's going to be them who's filing a claim for loss.
Now, when I have fragile stuff that I want to send, I pretty much always use DHL, because I haven't had them mangle anything yet, but you can't beat FedEx Ground for being dirt cheap. You just have to be prepared for a lot of bureaucratic hassle when they drive over it.
The other thing I learned, is to always take a photo of the shipping label, or note the tracking number, on everything. Both UPS and FedEx are absolutely worthless unless you have a tracking or waybill number, and oftentimes, shippers won't bother to keep records of that on their outbound stuff. (Which means if it gets lost, everybody's hosed.)
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
If the average Slashdotter applied the same flawed logic to Microsoft, you'd have to say they're big open source sponsors too. After all, Microsoft has released GB of free source code for utilities, etc. for decades. Sure, the code mostly only works with their proprietary "family jewels" (the OS and development tools), but why quibble?
Just use these for binary:
http://en.wikipedia.org/wiki/Binary_prefix
And use the SI prefixes for powers-of-ten, and all will be okay. The more who do this, the more accepted it will be, the fewer who won't understand what these mean, and less confusion will result.
When we're talking about addressable computer memory, approximating the kilobyte to 1024 is a convenience, but since Terabyte gives such a huge error, and makes absolutely no sense for data transfer or disk sizes, it's really time we stopped this illogical naming convention just because some engineers found a term convenient 40 years ago.
Yes, it's so funny when all these guys just keep arguing why 1024bytes should really be 1000bytes because they don't want to care that it's history, it's practical, it works, and anyway why the hell should be 1000, let's make it 999. Now you go calculate. So, it's funny when they just keep arguing about that but just wait and see how they react when you bring up the mile/feet/pound issue (which really is an SI-issue btw, unlike the byte) from "40 years ago", or well, a bit more so what gives.
Fact is, we who care about 1MB being 1024B, we don't really care how mister joe wants to call a megabyte and how much he wants it to be. We know what they mean, it's their freaking problem that they have created this non-issue for themselves so they won't know what we mean.
I am putting myself to the fullest possible use, which is all I can think that any conscious entity can ever hope to do.
Actually, even tarabyte sounds better than tebibyte :P
I am putting myself to the fullest possible use, which is all I can think that any conscious entity can ever hope to do.
Actually, at least the earlier versions of MS-DOS *WAS* open source - iirc, Microsoft actually distributed the source code (or at least made it available) of some of the early 1980s MS-DOS.
If you want to be strict, the SI defines the "tera" prefix as 10^12, so 1 terabyte = 1000 gigabytes.
If you want to use the binary values, you might as well use the correct "tebi" prefix. NIST says you should, and it looks like the IEC, IEEE and BIPM agree.
GPG 0x1B479C78
TB is killing people all over Africa, and Google wants to see it swapped around our schools, too?!? I knew those liberal, heathen, California commies would be the downfall of this great nation!
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
I've got Celestia
I see your informative link, and raise you a pithy comment.
Why is a Kilobyte 1024 bytes, if "Kilo" means 1000, both according to the SI and the greeks (Kilo is derived from khilioi). If 1 kg = 1000g, 1 kV = 1000V, 1 km = 1000m, why should hard disks break the pattern?
:)
Read all about it here http://en.wikipedia.org/wiki/Byte
The thing that gets _really_ confusing is that a byte does not have to be 8 bits. I don't know of any modern computers that don't use 8 bit bytes, but there were 7 and 9 bit byte machines back in the day.
Harddrive manufactures have their version of bytes, networking people speak in bits, and all of it is a mess.
A friend of mine is older than me, and when he was in school light came in angstroms, but today light come in nanometers. (Anstroms are deprecated because they are not in the SI power of a thousand rule).
The moral of the story is that standards are not that standard, but when the standard becomes standard the standard is subject to change at any time
Why don't they make a 'google earth' that uses hubble data. Instead of looking down at the earth you could look up and away, allowing zooming just like google earth but with pics of the universe. I'd put up with adsense to be able to browse that kinda interface.
It's "dragged"
Dude, think about what you post.
Try multiplying 2 x 2 x 2, 1000 times.