Slashdot Mirror


Google's Academic TB Swap Project

eldavojohn writes "Google is transferring data the old fashioned way — by mailing hard drive arrays around to collect information and then sending copies to other institutions. All in the name of science & education. From the article, 'The program is currently informal and not open to the general public. Google either approaches bodies that it knows has large data sets or is contacted by scientists themselves. One of the largest data sets copied and distributed was data from the Hubble telescope — 120 terabytes of data. One terabyte is equivalent to 1,000 gigabytes. Mr. DiBona said he hoped that Google could one day make the data available to the public.'"

9 of 190 comments (clear)

  1. so.. by mastershake_phd · · Score: 2, Interesting

    Whos going to own the data? I hope Google isnt going to say they do like they want to with the old books theyre scanning. Everytime you download a hubble picture will it have a google watermark?

    1. Re:so.. by cfulmer · · Score: 2, Interesting

      The ownership of data is presumably a case-by-case thing that depends on what the data is and how it was acquired.

      For example, Google does not own the copyright on out-of-copyright books that it scans in (nobody does, by definition.) At best, it might own the copyright on the scan that it did, but that's really unlikely--copyright protects creative expression and a straight scan doesn't add any.

      However, they probably have some rights under unfair competition law because they have gone through a lot of work acquiring all this data and it would be unfair for somebody else to piggyback on that work to compete with them.

      Recognize also that many of the "Hubble Pictures" you see are colorized versions of raw data that incorporates non-visible parts of the EM spectrum, assigning colors to things you can't see with your eyes. That assignment of colors to create something pleasing to the eye is certainly creative expression. So, if Google takes the raw data and does that color assignment itself, well, the result is theirs.

  2. Never underestimate ... by boyfaceddog · · Score: 2, Interesting

    The bandwidth of a moving van full of disks.

    Looks like Google is hoarding data. Seems they at least are equating information with power and money. And them that has the power and money makes the rules.

    --
    Here will be an old abusing of God's patience and the king's English.
  3. Re:1TB = 1024 GB by NinjaTariq · · Score: 2, Interesting

    Use the kibibyte if you have a big problem with it.

    But I have long since buried my problem with using the SI prefix with byte to mean a power of 2, actually not sure i ever had one, I just accepted it. I am happy with the 1024b=1Kb, 1024Kb=1Gb and 1024Gb=1Tb. The usable space is lower in the case of non-volatile storage anyway, 1Tb never means 1024Gb might be closer to 1000Gb (i don't know).

  4. Re:Like days of old by meringuoid · · Score: 3, Interesting
    This sounds almost like stories of scholars trading/copying books from long long ago.

    According to what I'm told every time I watch a DVD, these scholars were in fact stealing books.

    --
    Real Daleks don't climb stairs - they level the building.
  5. ...why not tapes? by Penguinisto · · Score: 3, Interesting
    I understand the whole "HDD w/ a common filesystem = more compatibility" thing, but wouldn't it be easier to simply send along some tapes of a type appropriate to the format/type that the scientific institution uses? LTO-3 can do 800GB compressed, SDLT can do up to 600... and neither is susceptible to data loss when it gets bounced too hard by FedEx/UPS/DHL/Whatever. (plus it would make for a lighter package, wouldn't require some poor IT schmuck to disassemble a server or wait forver for USB to transfer all of it, etc...)

    I'm not criticizing or anything; just curious is all.

    /P

    --
    Quo usque tandem abutere, Nimbus, patientia nostra?
    1. Re:...why not tapes? by kulover · · Score: 2, Interesting

      The reason for not using tapes is exactly because of the compression. The time it takes to compress that data and then send the data to the tape takes a lot of time. That same process would have to be repeated on the other end.

      Besides, using HDD for transfer means immediate access to the same data on the other end with speeds that are unmatched with tape backup systems. It might also be worthy to note that data sets that large usually are stored on large RAID systems like this one from LSI Logic, http://www.lsilogic.com/storage_home/products_home /external_raid/6998_storage_system/index.html, and are not installed into a computer like you may be thinking. It provides unmatched speed and reliability. A single rack system can sustain 1,600 MB of transfer to attached hosts, which is how Google will probably use it anyway. I highly doubt a single computer will be looking at that much information.

    2. Re:...why not tapes? by K8Fan · · Score: 2, Interesting

      The "TeraScale SneakerNet" paper posted earlier anticipates and answers that. They ship a fully assembled computer with processor, RAM, OS and network interface. Plug it in to the wall, plug it in to the network and assuming you had previously agreed on a networking protocol, you're rolling as soon as it boots! No restoration, no decompressing, immediate access to the data.

      Does anyone have a Linux distro for this specific purpose? Preferably tiny enough to fit onto a USB key and optimized for bandwidth, preferably with a web server interface for configuring the discs and network?

      --
      "How perfectly Goddamn delightful it all is, to be sure" Charles Crumb
  6. Not acording to NIST by Ernesto+Alvarez · · Score: 3, Interesting

    If you want to be strict, the SI defines the "tera" prefix as 10^12, so 1 terabyte = 1000 gigabytes.

    If you want to use the binary values, you might as well use the correct "tebi" prefix. NIST says you should, and it looks like the IEC, IEEE and BIPM agree.