Slashdot Mirror


Mailing Disks is Faster than Uploading Data

CowboyRobot writes "Who would ever, in this time of the greatest interconnectivity in human history, go back to shipping bytes around via snail mail as a preferred means of data transfer? Jim Gray would do it, that's who. And we're not just talking about Zip disks, no sir. We're talking about shipping entire hard drives, or even complete computer systems, packed full of disks. David Patterson (one of the developers of both RISC and RAID) interviews ACM Turing Award winner Jim Gray." Back in school we always had a saying, "Never underestimate the bandwidth of a station wagon filled with backup tapes." Seems like that still holds true.

20 of 581 comments (clear)

  1. Tapes too... by inertia187 · · Score: 5, Informative

    This reminds me of how data is collected for SETI@Home:

    After the data is recorded onto tapes at Arecibo, they are shipped back to the SETI@home lab in Berkeley, California. The data are then broken up into workunits, which are sent out to the client screensaver program for candidate signal detection. So far, SETI@home has generated 189,598,882 workunits from the data received from Arecibo. SETI@home has split 1,139 tapes, meaning that the average tape yields 166,709 workunits. This is somewhat lower than the optimal yield of roughly 200,000 workunits per tape because of radio frequency interference, gaps in recording, problems with the recording equipment, etc.

    I think a work unit is 65,536 bytes. Even if it takes a week to ship one tape, you can't beat that throughput! But the latency is the worst.

    --
    A programmer is a machine for converting coffee into code.
    1. Re:Tapes too... by Rura+Penthe · · Score: 3, Informative

      Well one tape = 166709 units * 64 (k) / 1024 / 1024 = ~10.175GB. 10.175GB a week is not particularly impressive. :)

    2. Re:Tapes too... by pixelite · · Score: 5, Informative
      Well one tape = 166709 units * 64 (k) / 1024 / 1024 = ~10.175GB.


      That figure is per tape, the actual shipment has 1,139 tapes, I think. 10.175GB * 1,139 = ~11.6TB. That *is* impressive bandwith.
      --
      >>Sig under construction
    3. Re:Tapes too... by Piist · · Score: 2, Informative

      According to http://setiathome.ssl.berkeley.edu/cacm/cacm.html a work unit is 350K. 189,598,882 work units at 350K per is roughly 61.8 TB (base 2). Also according to that paper, they released the first Windows and Mac clients in May of 1999. Assuming they started shipping tapes at the same time they released the client, they've been shipping tapes for roughly 4 years and 1 month. 61.8 TB over that time is a little over 298 GB/week, which would be the equivalent of just over 4 Mbits/second contantly over that 4+ years.

  2. Stationwagon Quote by Anonymous Coward · · Score: 1, Informative

    I believe that the station wagon quote really belongs to Andrew Tannenbaum.

    1. Re:Stationwagon Quote by joe_bruin · · Score: 4, Informative

      i believe your attribution is correct.

      Never underestimate the bandwidth of a station wagon filled with backup tapes

      however, while the immediate bandwidth of a station wagon filled with tapes may be enormous, the overall bandwidth is quite poor. this is because of the slow write/read rates of the tape drive, and the slow overall speed of the station wagon. i can transfer 3 gigs from my work computer to my home machine faster than the time it would take me to write the 3 gigs to tape, drive it there, and read it back from tape (and my drive is only 15 minutes). if i lived 5000 miles away, my tape bandwidth would be considerably worse, while my internet bandwidth would be virtually unchanged.

      since this statement was made, we have reached the point where internet bandwidth has exceeded the "vehicle full of tapes". now, this one might be good for a few more years:

      Never underestimate the bandwidth of an sr71 full of netapps

    2. Re:Stationwagon Quote by putaro · · Score: 2, Informative

      You've got a crap tape drive. Does your internet connection do 30MBytes/s? You're also allowed to have more than one tape drive :-). You're also supposed to put more than 1 tape in the car.


      Let's assume LTO (Ultrium 2) tape at 30MB/s, 200GB/tape (uncompressed - let's compare apples to apples). We'll use a Chevy Suburban with 3919 liters of interior space, assume 3 tapes to the liter, so about 10,000 tapes, with room for fudge and packing material.


      That's 2000 TB or 2 Petabytes in one vehicle.


      Since we've bought 10,000 tapes (those things ain't cheap) we may as well have 100 tape drives to read them and write them (200 total, 100 on each end).


      At 30MB/s it takes about 2 hours to read or write each tape, so 4 hours per tape or 40,000 drive hours total, or 400 hours total to read/write the tapes. Assuming we stop to go to the bathroom and eat occasionally but no stops for sleeping (2 drivers) we'll average 50 miles/h or 100 hours. (You may drive faster than this but it make my math easy)


      Total time=500 hours for 2000TB or about 1.1 GB/s. If we assume only 1 tape drive on each end, it's still 13MB/s. Yah, still not to be underestimated :-). As you can see, the speed of the vehicle and the distance has very little impact when you're moving such a large amount of data.


      I think density-wise tapes and disks (bare) are about the same today as a 250 GB IDE drive is about the same size as an LTO tape. Now, if you have your imaginary Beowulf cluster ready to hook all of your IDE drives up, imagine the bandwidth of that!


  3. Re:Well, depends on what way you look at it. by dbrower · · Score: 2, Informative
    I was going to mod the parent "overrated", but it wasn't worth the points, so I'll argue it here.

    The poster just didn't read the article.

    First, he naively says that the file is there "instantly" if you transmit it. That's not true for big files, which will take size/bandwidth to arrive. It does you no good to get the first file if you need all of them anyway.

    Second, the bandwidth is NOT cheaper than the postage. That's one of the main points. A gigabit OC line costs significant money, and even it is goign to take a day to ship a terrabyte. For the $200 shipping, Gray can send several terrabytes overnight. The shipping is cheaper than the bandwidth. Geez, he actually talks about the numbers, and works them through, and people still don't read/believe it.

    Another poster talked about tapes - which you have to laboriously load at the receiving site. When Gray ships the whole computer, it arrives as an instantly available NAS file server with the data. This is way more useable.

    -dB

    --
    "It if was easy to do, we'd find someone cheaper than you to do it."
  4. Re:The bandwith is there, you just can't have it. by RaboKrabekian · · Score: 2, Informative

    Cable TV companies are pumping dozens of digital movies accross their system at once, live.

    Um, no. They're broadcasting one movie at a time. You're not receiving all channels at once. That makes a huge difference in your argument.

    --
    "Moderate drinking can help prevent amputated limbs" -- Abigail Zuger, NYTimes, 12/31/02
  5. Re:The telecom industry is to blame. by semanticgap · · Score: 3, Informative

    I have this theory that the reason teclo prices have not changed is because of long term agreements. Back in my ISP days, we used to sign 7-year terms on T1's because they were cheapest and we knew we'd need them. This was in 96-97, so these agreements are in force until 2003-04... When time comes to renew this, noone in their right mind will pay, and we will see a drop in high-speed prices (and Verizon and MCI wining to congress probably).

  6. Hey guess what by autopr0n · · Score: 2, Informative

    you are reciving all the channels at once, it's just that you're only decoding one at once. Lots of people decode more then one at once, such as using the TiVO or a VCR, or using picture in picture.

    If you wanted too, you could record all of them at once, quite easily.

    --
    autopr0n is like, down and stuff.
  7. MOD PARENT DOWN by Anonymous Coward · · Score: 1, Informative

    "Yet they crimp your upload speed to DSL rates or lower, 30KB/s, because they are afraid of people "stealing" movies. This is not a technological problem, it a social one."

    Actually, it IS a technical limitation. You obviously aren't aware that each channel requires a specified amount of the RF spectrum, and cableTV was built to be primarily downstream since there were no such things as cable modems for decades. The upstream range used on most cable systems is limited by the condition of the decades old primarily downstream cable plant, and that's why they are throttled down. As cable plants get better the upstreams will increase.

  8. hm... by autopr0n · · Score: 3, Informative

    But they could also take 50 different routs to get there. There are all different kinds of ways to rejigger the figures, but were talking about what's practically possible. Employing enough people to man those 40,678 loading docs full time (what you would need to offload 1 truck/ 0.28 seconds), would be at least 5.55*40,672*24 is about $1,083,667 dollars per day, or almost $400 million a year. For that kind of money you could probably afford to lay down multiple parallel multifrequency optical cables.

    --
    autopr0n is like, down and stuff.
  9. Faster shipping by rice_burners_suck · · Score: 4, Informative
    I think a system should be devised where you could queue a big upload or download, and the network will "know" how to send big chunks of it when things are relatively idle. By queueing things for what basically amounts to "background" transmittal, the network might be more fully utilized.

    The station wagon comment reminded me of an idea that I had a long time ago, when I first read about how the Internet routes packets around. You know how you can ship stuff UPS overnight? It can get pretty expensive, depending on how big and heavy the package is. And sometimes, businesses would pay an even greater price to have a package delivered even faster. Why not introduce a system for getting things delivered extremely fast, and I do mean fast, all around the world?

    Imagine this: Put together a network of railroad-like tracks that are enclosed in concrete tunnels. In a vacuum. Individual cars would travel on these tracks at greater than mach speeds. They would essentially go from one switching station to another, kind of like the telephone network or the Internet. They might come in several sizes, these cars. When you need something delivered fast from, say Los Angeles to New York, the package would be placed on a dedicated car which would take it at blazing speeds through, say, Albuquerque, Oklahoma City and Louisville, to New York. At each station, equipment would adjust switch tracks to route the car to its next switching station; the car would not even have to stop or slow down. The package might be there in four hours, counting the time it takes to bring the package to a station, have it loaded, unloaded, and then transporting it to its final destination.

    This might actually make shipping cheaper rather than more expensive. Automatic equipment sorts mail at the USPS. If this mail were collected, say, once every hour (during business hours), taken to the nearest major USPS distribution center, where it is sorted, placed in boxes heading to the same destinations, and then shipped (tunneled?) through the above method, mail going to a distant location might arrive faster than mail going across town. This could be done with collections of packages that are all going from one major city to another together. Load them in a container and bust them all over there. Sure, it'll still take, say, 24 hours to ship packaged in such groups, to save money, since you have to wait for enough packages, sort them, group them, etc., but if you want something shipped right friggin now, the option to get a dedicated car is still available. This might reduce use of gasoline and use of air and ground traffic. If computers can control the cars on these tracks so that cars are going mach 2 almost bumper to bumper, that would allow for extremely great throughput.

    Back to the station wagon comment, supposing this could be done, (running more tracks all over the world and installing these switching stations at each major city), you could load hundreds of terabytes of data onto a big friggin raid system and then get that data across the world faster than shit going through a tin horn.

  10. Re:Can someone explain VOD to me? by zenyu · · Score: 3, Informative

    So how do they do this? I've always been under the impression that with digital cable and cable internet, all of the data has to be sent to everyone (in the same neighborhood anyway), so how can they handle the hundereds of channels (some of which are actually lower quality than others), the multiple VOD streams (even for the same movie), and eveyone's porn and mp3 activities all at the same time?

    This one is simple they ran fiber to the curb a few years ago. They even ran new coax to our apartments to handle more bandwidth. There is effectively infinite bandwidth running into your apartment.

  11. Re:Offload them to where? by Gorobei · · Score: 2, Informative

    I read your link to stanford. Look at the volumetric density numbers towards the end: in 1999, 500Gb/Ci - that means 20 cubic inches of media holds the entire 10T of data per truck. Your 2m by 2m spool of 8 micron tape @ 500Gb/Ci lets us reduce our trucks to one every fifteen minutes or so. Easy enough to buy an array of tape readers without worrying about the speed of light.

  12. That's Tanenbaum by code_martial · · Score: 3, Informative

    Never underestimate the bandwidth of a station wagon filled with backup tapes.

    This is a statement by Andrew S. Tanenbaum from his book titled Computer Networks. Though it's supposed to be a text book (with 4.5 stars on Amazon.com), I and most of my friends also regard it as a nice collection of stories related to computer networks and communication ;-)

  13. Re:The bandwith is there, you just can't have it. by Slurpee · · Score: 5, Informative


    What a great example you picked! Cable TV companies are pumping dozens of digital movies accross their system at once, live. Yet they crimp your upload speed to DSL rates or lower,


    very wrong.

    But enough truth to fool people into believing what you said.

    You are correct in saying that a digital cable system pumps out lots of bandwidth. They do. A movie chan is generally about 4mb/s, possible 8. A chan such as the shopping chan may be 1mb/s. So your cable company with 100 chans is pumping out approx 400mb/s.

    Thats a lot of data.

    But it is broadcast. Each customer is not individually downloading 400mb/s each. They are sharing *one* broadcast. It is not one stream per customer, but one stream is shared between all customers.

    To use a cable for internet, assuming no TV is being broadcast, you can share that 400mb/s between all your users. Customers will have 4kb/s (thats kilobits) EACH (assuming its all shared equally). Not huge.

    Obviously this is not the whole story. Your bandwith is shared between all customers on a node of the cable network (think of them as hubs). If you are the only person in your node, you will get full bandwidth. A node could cover tens, if not hundreds of thousands of users. If every person on your node is using the net to download porn, you will have a very slow connection (better using a modem). Also, the cable company wants to not just do internet, but TV too! In fact, most of the bandwith is used with TV/Movies.

    So, they end up using part of their bandwith for internet, and part for broadcasting TVs.

    How much they set aside for each is a buisness decision, as well as a technology one. If they sell cable internet, the costs are huge, setup, support, network, etc. Costs go up *per user*. Costs for TV is small (ish). Pay for content (movies), get money in from advertising, users, etc etc. No big support costs, no extra costs for bandwith etc etc. One stream can support hundreds of thousands of users.

    It is both a technological problem *and* a buisness problem. They aren't giving you small limits cause they are afraid you will download videos. Don't be paranoid. They don't give you unlimited bandwith cause they can't, and it costs them a lot anyway.

  14. Real Scientists Ship Hard Drives by Anonymous Coward · · Score: 1, Informative

    I am on of the PI's on the DRIFT Dark Matter Search
    (http://euclid.math.temple.edu/~martoff ; funded by the National Science Foundation http://www.nsf.gov ). We collect about 6 GB/day of data on a Linux PC located 1180 meters underground in a salt and potash mine in North Yorkshire, England.

    The only practical way to get this data home to the USA for analysis is to remove and ship the 40 or 80 GB hard drives. We have collected over a dozen drives this way, total around 700 GB. Of course we analyze this on another Linux PC, dual Athlon plus a farm of 5 surplus computers from around the university (500 MHz PII's + 768 MB ram just weren't fast enough for those office-types to run Windoze with!).

  15. How Disappointing /.! by MattRog · · Score: 3, Informative

    I'm pretty disappointed (although not entirely surprised) in SlashDot posters. This article was clearly more than simply 'mailing disks' which > 95% of the topics (including dupes of dupes of dupes of ...) on this article have been about.

    Sure, he mentioned cost of shipping disks, and actually concluded that shipping an entire computer system is more economical than mailing individual disks. However, there are far more interesting and discussion-worthy conclusions he raises.

    What about disk capacity reaching such incredible sizes as 2TB/disk - and the fact that current random-access methods will render such drives unusable? This affects all of us, since our OS' filesystems will need to fundamentally change to be more sequential (e.g. like tape drives). Personally, I hope that whatever happens to the fs the OS will insulate me from being forced to use it in a sequential manner (e.g. will I be exposed to the sequential nature of the medium or can it be successfully abstracted?)

    He talks about, in almost glowing terms, the SlashDot favorite MySQL and how "At some point, somebody will say, 'I'm running my company on MySQL.' Indeed, I wish I could hear Scott McNealy [CEO of Sun Microsystems] tell that to Larry Ellison [CEO of Oracle]." And, although the Research Area people are pretty independent, this is from a MICROSOFT employee. Not a peep from the /. audience.

    Personally, I think that using MySQL as a 'research tool' as he suggests is a Very Bad Idea - it's not even a mediocre implementation of the relational model and there are better open-source implementations out there (PostgreSQL being the one that comes to mind). Basing scholarly studies on MySQL would be like basing the foundation of a skyscraper on a shack (not that any other SQL DBMS's are much better, but why use one of the worst?). The best 'research vehicle' would be an open-source truly relational database management system (there are no commercial TRDBMS either). It doesn't have to be very advanced, but it has to be architected from the ground up to be a TRDBMS (which means SQL doesn't cut it as a query language).

    One thing he notes which I see as being a large problem in the open-source community as well is how "...The thing that slows Oracle, IBM, and Microsoft down is the testing, and making sure they don't break anything--supporting the legacy. I don't know if the MySQL community has the same focus on that." As a long-time PHP developer and advocate I'm still hesitant about updating our production systems - it seems as if every successive release of PHP has innumerable functions removed or changed with no ability for backwards compatibility. I guess it's a lot easier to say to users 'you get what you pay for' when they are just that - users and not clients. One of my disappointments from many open-source proponents (which I am one) is the hostility to treating clients as clients - 'you can always edit the source', etc. - for the most part large companies don't care/want to edit the source - that is what they want to pay you to do. Until more projects (MySQL included) start to realize this, then they will pretty much always occupy niche roles in the enterprise.

    Finally, even he, an academic seems to (at times) confuse the relational model's implementations' details (e.g. the SQL product performance) with the model itself (of which there is no mention of performance, because it has nothing to do with the model). Theoretically, a TRDBMS should be faster than the SQL implementations we have today. It just takes someone to do it, and I don't see why the open-source community can't build the BEST mousetrap there is - we just have to abandon the 'mob culture' of MySQL.

    --

    Thanks,
    --
    Matt