Mailing Disks is Faster than Uploading Data
CowboyRobot writes "Who would ever, in this time of the greatest interconnectivity in human history, go back to shipping bytes around via snail mail as a preferred means of data transfer? Jim Gray would do it, that's who. And we're not just talking about Zip disks, no sir. We're talking about shipping entire hard drives, or even complete computer systems, packed full of disks.
David Patterson (one of the developers of both RISC and RAID) interviews ACM Turing Award winner Jim Gray." Back in school we always had a saying, "Never underestimate the bandwidth of a station wagon filled with backup tapes." Seems like that still holds true.
Storage grows probably more quickly than bandwidth.
http://yetanotherpoliticalrant.blogspot.com
the BBS I worked for used to ship out tapes archives of the file libraries. Used to take hours to store upto 160 megs of files.
Netflix has made a business out of shipping data via snail mail, since the bandwidth isn't really there yet to do it over the internet.
Vote for Pedro
First of all, when downloading, you have the benefit of instantly recieving the file that you need, as opposed to waiting at least a day for your shipment to arrive.
Secondly, remember that bandwidth is probably cheaper than postage. Shipping a carton with a few hard disks and proper insulation would cost at least $30 to overnight it.
Really, the title of the article comes upon the conclusion way too quickily. You must consider much bandwidth the sender and the reciever have. If both have a several gigabit OC line, then perhaps uploading it would be faster.
The figures, but does the cost of the bandwidth exceed the price of gas?
Eh. Guess it doesn't matter anyway. Its still cooler to be seen driving down the street w/ lots of tapes.
This is how ArsDigita University distributes its course material: http://aduni.org/drives/
It's one thing to complain about the lack of growth in bandwidth of current storage (and there is quite a bit of complaining in this article about it), but to think that there is something wrong with having all this data that is theoretically impossible to access because the bandwidth is insufficient is clearly false.
Whether data is ever used or not, it is important to have it. I have tax records from the last 7 years that I never plan on opening. They are stored in a couple shoeboxes in the back of the garage next to the reindeer prods. There may be no reason to hold onto them as I doubt I'd ever get audited, but it's important to know that they are back there.
Data itself is important to have for archive purposes, regardless of whether anyone ever looks at it again.
Yes, but what's the bandwidth of a minivan full of CDROMs? I get 235 Mb/sec. Enjoy.
CowboyNeal writes:
That `saying' is from Andrew S. Tannenbaum's notoriously well written textbook titled simply: "Computer Networks".
It was certainly in the 2nd edition, the one I used, and might have even been in the 1st edition. I is still in the latest edition. (One of the young-uns in the office has the 4th edition on his shelf.)
A famous line if ever there was one in the geek world, although perhaps not as humourous as Chairman Bill's:
"640K ought to be enough for anyone [ paraphrased ]".
Why does everyone only count bandwidth as the time to do the transport? The same comparison has been made of Netflick. Retrieving from storage and placing it back into a usable format takes time too.
Example: station wagon full of backup tapes. Presumably, you are going to store your data at both locations (onsite and offsite copies). Now count the time mounting each tape and it's target, doing the copy, and returning the original to the car. Yes, even at 15MB/s (LTO drives) it's good, but it's still a long time. Then you need to drive back.
The comparison is useless unless you account for:
Of course, no one said that the data needed to arrive within a specific time as well. If the data is useless 3 hours after it was collected, then all these analogies are useless.
...when the modems were scarce and phone bills high. Every more or less respectable demoscene group had a member whose function was listed as "swapper".
:)
;) which said a girl wants to swap, everyone welcome etc. This was bringing a good deal of free floppies, often with some quite funny stuff on them.
:)
Swappers would get in contact with swappers from other groups, and exchange floppies full of newest stuff, productions, news, and everything of any interest (plus some exotic stuff other than floppies - a chicken bone, The Party membership ID, misprinted train tickets, and whatever interesting that caught the eye and filled the envelope up to (but not above) another price-weight treshold.)
One of the most specific swapper activities was "faking stamps". With 80 and more contacts, at least one letter a month exchanged with each of them, you had to cut on stamp prices, so you smeared the stamp with water-washable glue and wrote in the letter "stamps back", so your contact ripped your stamps off the envelope and sent you in his reply letter together with floppies. Then some washing and stamps could be reused - one set of stamps could go the same way 5-6 times before they needed to be replaced because they started looking suspect. And if it was found - you never put return address on the envelope and nobody in the post office could ever read an Amiga floppy
Another practice was making the floppies sent pretty. You almost never sent back the same floppies - they were in constant flow. Adding a marker signature was the default. Often some sticker or a drawing was common. But there were true masterpieces: A floppy painted gold, with the metal part (and under it) painted silver, the metal part without the spring but removable and attached with a thin chain to the write-protect hole, so you removed it before inserting and it was hanging from your floppy drive while the floppy was inside.
And finally all the "disk hunt" methods. Famous swappers were rarely replying to newbies who were asking for contact - you had to gain some fame on the scene with your group's productions - or get a recommendation from another swapper. So - the unanswered letters were a good supply of floppies. Sometimes they would even put an ad in some zine (spread by swapp of course
Well, Internet was what put end to it. Plus average data size - sending 6-8 floppies in one letter wasn't cheap or easy anymore, and with A1200 getting more common, high-level languages, multi-disk demos and mpeg movies, it became necessity...
Nowadays still throwing a CD across a computer lab is way faster than transferring the data over the net
45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
Database people, unlike gamers, don't care about latency!
Basicly they shunt data around, the same way Exxon et al move oil.
If you RTA, it sounds like based on current advances, in 10 years we'll be at the point where disks are so large 920 TB each) that access will have to become sequential (making them like tape today, access speeds not increasing as fast).
That would leave room for RAM to essentially become used for random access in the way the disks are used today and perhaps current cache on the CPU to be used more like RAM is today?
A lot of wire-speed net devices are starting to look like this, with their info stored in a non-volatile storage device, but loaded into RAM on startup and all "work" done in ram.
It's easy to image a whole chain reaction of purposes for devices slipping into other functions as a result of varying levels of technological advancement in them.
The party of stupid and the party of evil get together and do something both stupid and evil, then call it bipartisan.
uucp to Australia used to be done by uploading a spool dir somewhere in the US to a tape and airfraighting it to Oz, then doing the same at the other end. You'd post something to usenet and get a reply 2 weeks later
Every city now has a monopoly on internet access so you wont be seeing anything new in cable modem or dsl speeds. Go slow and SOAK the public than Invest, it's smarter that way since their is no competition.
http://www.dslreports.com/ has a lot info on municipal broadband which the cable and phone companies fear.
We are living in the past and the public still doesn't get it. The phone and cable companies are profitting so why should they change.
Did anyone else see that the disks will have IPv6 or IPv9 addresses. You could give every disk an IPv6 subnet and still be able to address every byte on the disk within that subnet.
I can only imagine the address space of IPv9!!! Anyone have the specs, or was that just humor?
More than enough BS
One of the major thoughts in the whole interview is that our storage has increased to such a point that we can't access it all in a reasonable fashion. For my uses (which are far from industrial level) I find that I can only watch one movie or listen to one song at a time. On my 200 gigs of hard disk I've got 60 gigs of music (and growing daily) and at least 100 gigs of movies.
Don't judge me on the legalities of the situation, but note that this isn't uncommon...I have some very drastic media needs and the media that I like is pretty intensive, but I don't very often need to stream any of it en masse to another location. It suits it's purposes fine exactly where it is, and I haven't had any problem acquiring any of it or accessing it.
I suppose my rambled-to point is that for my needs I'd rather there was more storage at this point than have higher access speeds as I can get all that I need as fast as I need it. Perhaps our usage of the medium dictates how it develops.
"Share your knowledge. It's a way to achieve immortality." -- Dalai Lama
I am a mainframe console operator at a large computer datacenter, with several state government agencies outsourcing their mainframe processing to our center. We have every day, UPS and Fedex shipments of 3480- and 3490-format tapes (look like 8-track audio tapes) to load into the mainframe, even some old ass 3420 tapes (the big magnetic reel tapes)
some of the files on these tapes are litereally only a few kilobytes large.. (!)
certainly is NOT faster than ftp'ing the data over, considering the agencies main offices have dedicated T1's and T3's going into the mainframes. but due to the beaurocracy, and fear of changing ANYTHING mindset these agencies have, they still mail these tapes back and forth
granted, some of the sites have started mailing floppy disks or burned cd's instead (laugh)
Work units are ~300 kilobytes a piece, or at least thats how much the client downloads.
Vonal Declosion
The essence of the article is snail mail has higher bandwidth than electronic means (or something to that extent). This ignores the fact that most programs/data transmitted today are huge. gigabytes. A one-sided DVD is 4.7gigs. Even if it took, for the sake of argument, 2 days to transfer that DVD electronically and 2 days to ship the DVD across the country priority mail, the cost of badwidth vs. postage has to be taken into account. Postage for a disc is a little less than $3 for priority mail (and less than a dollar for regular 1st class). Is having one's bandwidth tied up (slowing down everything else on the network) worth $3? $1? No, of course not. And as data gets bigger and bigger (it always does), mail will still cost less ... at least for a long time. CDROMs and DVDs are small and light--perfect for sending cheaply in the mail. So, to say mail is faster than uploading data is a shitpoor comparison. And while it sounds fascinatingly shocking, that's only because it's ignoring some pretty big factors.
Stupid people make stupid things profitable.
I just recently moved halfway across the US from my hometown. A buddy of mine who had a ton of MP3s (mostly legal BTW) had just suffered a HDD crash and his SO's car had been broken into meaning that TONS of music had been lost/toasted. Before I left, I'd copied his whole collection to my drive. Shipping him a drive with the whole contents (60 GB) of my music collection took a hell of a lot less time than letting him download it (at 20 Kb per second (Ghod I hate SBC!)) or worse yet, take the time to pick through it at human speeds, and was far cheaper unless you figure that the cost incured by me sending it overnight was in addition to my regular bills.
Dok
"You can't screw the system, but you can give it a good fondling." -- Too lazy to look it up
Well one tape = 166709 units * 64 (k) / 1024 / 1024 = ~10.175GB
About one second on an OC-192 fiber.
That figure is per tape, the actual shipment has 1,139 tapes, I think. 10.175GB * 1,139 = ~11.6TB. That *is* impressive bandwith.
Call it 20 minutes. Or 1:20 on a measly OC-48.
Sorry, but now that I'm working with fibers I'm just no longer impressed by the bandwidth of a busload, planeload, or even a cruise-missile load of backup tapes. Even an ICBM-load is barely in the running.
That's progress for you. Time to switch to CDs or DVDs if you want to keep the move-the-medium approach ahead of the communication infrastructure. Even that may not last.
Now what WILL impress me is being able to afford to have a SONET ring bundle running through (and terminating in a router at) my house.
(Although my previous house WAS adjacent to just about the only street in the bay area where BOTH of Pac Bell's rings ran down the same set of manholes. So I came within maybe 50 feet. B-) )
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
Is this news? I work for a gov't lab that does computational astrophysics, and our physics-heads generate huge multi-terabyte data files all the time. We are actually under contract from NASA at the moment to develop a large, distributed disk array for the storage of these files.
;)
But what did the proposal we wrote to NASA say? You guessed it: even in our official documents we recognized the fact that it's much cheaper to ship even bulky, heavy hard drives than try and transfer the data over the wire. In fact, if I can dig up the concrete numbers we came up with I'll respond with them, it's quite interesting.
Moral of the story? Duh, we already knew this
Q:A man with a delivery bike can pedal at 20mph between the organisations two offices that are 5 miles apart. The basket on the bike can carry five half-inch tape reels. What is the effective throughput of this datalink? For extra credits: A modem can transfer data at 300bps. At what distance does this outperform person with bike.
Engineering is the art of compromise.
Here in NYC, Time Warner now allows us to pick from several dozen movies to be played at any time, including the ability to pause, FF and REW (with preview), etc. (video on demand). All of this at close to DVD quality too.
So how do they do this? I've always been under the impression that with digital cable and cable internet, all of the data has to be sent to everyone (in the same neighborhood anyway), so how can they handle the hundereds of channels (some of which are actually lower quality than others), the multiple VOD streams (even for the same movie), and eveyone's porn and mp3 activities all at the same time?
But that's one specific application; transferring large files. In the general case, you can't replace an internet connection with a high latency connection no matter how great the bandwidth. The point of the exam question was to emphasize the difference between latency and throughput.
Jason
ProfQuotes
I work in a AmLaw top 100 law firm in DC. We do a lot of complex litigation work. We use software such as Concordance, Ringtail, and Litgator's Notebook (runs on Lotus Notes) to manage collections of documents. The documents are scanned to group IV tiff; the meta data and OCR text that is extracted from the documents at scan time is loaded into another database that overlays the images.
These tiff file collections run into the millions.
Of course the point of doing this is to facilitate collaboration on document review between us, our clients and our co-counsel. These people are often 1000s of miles apart, and nearly as often have crap for IT resources (equipment and personnel).There are ways of accessing this stuff over the internet securely but it's never quite the same as having the real version of the software. This form of access often proves to be impractical for the lawyers who travel alot depending on the type of access they can get wherever they end up.
So what often happens is, we end up dumping the entire collection on a laptop with a big hard drive or a bigger firewire or USB drive, so they can work without access to the internet and then replicate changes when they can get the laptop back on ethernet or a POTS line.
Collections of images and databases (not to mention the various Power Point presentations and word processing files) can very easily run over 50GB. Moving this across the LAN, over my PC BUS to another hard drive and then FEDEXing it is certainly faster than doing the same transfer using FTP or SCP. Not to mention, that way I can install the software (properly) and test the whole setup before I send it off. The extra wear and tear I save on my psyche from NOT having to explain how to install all of the software, point it to the image collections, and deal with equipment I have no control over while being screamed at by extreme Type A attorneys going to trial makes that laptop look like a pretty good investment.
These are good if you have someone on the other end of your FEDEX run who know how to open the case on a PC and install a HD themselves. I can setup one machine with everything, image the hard drive, make copies on other drives and drop them into FEDEX pouches as fast as I can make 'em. I can't think of a faster way to move a few 100 GBs of stuff to a half dozen places inside of a day. If someone has ideas, I'm all ears.
If you never make mistakes, it's probably because you're not doing anything.
Well there is a Pattern That goes along comparing Desktop Performance and information needed v. download bandwith.
First during the mainframe era a serial connection was fast enough to operate a dumb terminal hooked to a large expensive mainframe because the information that needed to send was small and processors were expensive.
Bandwith Wins.
Then the PC era came about where software starts to become more graphically intensive and needing more direct access to the CPU power which is becoming cheaper.
CPU Wins.
Internet Era, when people are getting use to using the internet using now using graphics and bandwith have improved to send the graphics at a resonable speed.
Bandwith Wins.
Today. We have been collecting a lot of data and storage is cheap we are use to having a lot of data on hand. Data now is becoming more multimedia based thus taking a lot more space then a text file.
CPU Wins.
Possible Future. After echomony picks up and improvements in PDAs and a wide spread of wireless internet ether via WiFi or Cell. The PC will become more pointless in todays life. Servers will be providing the data for the PDA and intercommunicating with other PDAs.
Bandwith Wins.
And continues.
It is not a real circle and these times overlap but there is change in focus of desktop computing to client server.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
I'm involved in a project moving 150TiB from the West Coast to the East Coast. I can attest to the fact that it is cheaper and faster to ship tapes via FedEx.
Favorite quote from the article: "Not many of us know what to do with 1,000 20-terabyte drives--yet, that is what we have to design for in the next five to ten years."
Heh. I do, so get designing. The various law firms reviewing documents from cases like Enron (criminal , bankruptcy, and civil procedings), Microsoft's antitrust suit, the SCO v. IBM, etc. etc. need that space to store all the materials from their case work. Lots of paper from all those places get turned into electronic images managed by very large custom databases.
Guess how many Group IV tiffs and pdfs some of these become. Answer: millions. In five or ten years, cases such as these will likely consist of collections of data that large. Terabytes of data for cases such as these are not uncommon now. Enron could get this big by itself by then. It's well on its way to becomming one of the largest cases of all time. Check this out. Whoa.
If you never make mistakes, it's probably because you're not doing anything.
I run free anonymous FTP off my server because I can. Occasionally someone asks me if I worry about someone filling up my HD and crashing my server in the process.
Then I point out it takes around 8 hours to back up the 80GB drive over a 100Mbit LAN. I have a 640Kbit downstream connection. It would take a month to fill the entire drive.
I had someone connected to my server for 14 hours uploading a pirated game. I let him finish. Opened up the zip file and replaced everything in it with a single text file with the person's IP and log entries showing their attempt to pirate software.
I've often burned stuff to CD rather than upload it to my server over the net. Even for relativly small sites like my own, it's far more efficient. It's never an emergency situation where the files have to be there "this second" anyway.
It's not surprising that big companies don't waste their bandwidth that customers need and just transfer physical media instead where possible.
Ben
Work Safe Porn
The Internet has limited bandwith but will ship small mounts of data almost instantly.
The post office has amost unlimited bandwith but insain lag.
For example this message is posted from my pda to slashdot in seconds. The same message would take at least three days over the postal network.
But if I tried to send the entire contents of my computer over the net it would take a few weeks (for mainnence reasons I erased the mp3s and porn from my system.. already backed up)
I could copy everything to cdr and ship it.
There is a max load for postal but you'll only see it if your in retail or masproduction and shipping whole batches of refridgeraters or TVs.
The data bandwith of the postal system will continue to increase as long as data storage improves.
Internet bandwith has to contend with burrocracy and corprate compacency postsl bandwith dose not.
I don't actually exist.
I am a mainframe console operator at a large computer datacenter, with several state government agencies outsourcing their mainframe processing to our center. We have every day, UPS and Fedex shipments of 3480- and 3490-format tapes (look like 8-track audio tapes) to load into the mainframe, even some old arse 3420 tapes (the big magnetic reel tapes)
some of the files on these tapes are litereally only a few kilobytes large.. (omfg wtf lol!)
certainly is NOT faster than ftp'ing the data over, considering the agencies main offices have dedicated T1's and T3's going into the mainframes. but due to the beaurocracy, and fear of changing ANYTHING mindset these agencies have, they still mail these tapes back and forth
granted, some of the sites have started mailing floppy disks or burned cd's instead (laugh)
Hmm. That's set me thinking. What's the bandwidth of a large cargo ship (filled with the high-density mass-storage devices of your choice, of course) going across the Atlantic, compared with the trans-Pond pipes?
"Little does he know, but there is no 'I' in 'Idiot'!"
What this made me think was, you could install large banks of hard drives into the cargo holds of planes and the back of express long distance trains, and plug them into fast backbone connections whenever they're stationary. This would then let the internet route data that doesn't need low latency connections (such as FTPing terabyte files, where it doesn't matter if you receive the first packet now, because you're not going to be able to use the file until the last packet has arrived anyway) onto the storage devices, ready to be flown across an ocean or zipped up a trainline to some point nearer where it's going, where it'll continue on its way...
You'd probably need some TCP extension that allowed a host to mark a group of packets as 'part of a block' - so that all parts of the same block get routed the same way, and routers know how big the block is, and can calculate the fastest way to get the whole block to its destination. So, an FTP server, on receiving a request for a multi-terabyte file, would stream out packet after packet, all addressed to the client, with a block identifier telling routers that they belong to a consignment of 15TB, say. Now, a router starts receiving these packets, and thinks 'what's the best way to get 15TB to there?', and if the costs and speeds work in its favour, pumps them onto a hard drive in the hold of a plane that's taking off in fifteen minutes.
Now, meanwhile, you'd also want the server to send another packet - not marked as being part of the big block - to the client telling it that the file is being sent - otherwise, your client's going to time out its connection.
When the plane lands, the packets are streamed off the hard drive, and routing continues as normal.
Well, I dunno - might be a way to allow the net to handle demand for moving large files without requiring a massive increase in fibre bandwidth...
that FTP'ing X^Y Gb of data really chokes our WAN/LAN/Internet pipes and that they should use Tapes or DVD's. When we show them that the $1000's of recurring costs for bandwidth can be used more efficiently for a DVD +/- R/RW and postage, they realize this actually makes their bottom line a whole lot better.
One issue that has come up, and that is having media/reader problems. Make sure your data partner can actually read your tapes/disks/cards.
I think, therefore I am - Rene Descartes; I yam what I yam, an' that's what I yam - Popeye