Turing Award Winner On The Future of Storage
weileong writes "Ars Technica highlights an interview at ACM Queue with Jim Gray, a winner of the ACM Turing award *(among other things) by one of the pioneers of RAID (among other things). Many issues touched upon, including: "programmers have to start thinking of the disk as a sequential device rather than a random access device." "So disks are not random access any more?" "That's one of the things that more or less everybody is gravitating toward. The idea of a log-structured file system is much more attractive. There are many other architectural changes that we'll have to consider in disks with huge capacity and limited bandwidth."
Actual interview has MUCH detail, definitely worth reading."
I'm not supposed to get jigs in it!
Sit down, turn off your cellphone, and prepare to be fascinated. Clear your schedule, because once you've started reading this interview, you won't be able to put it down until you've finished it.
Who would ever, in this time of the greatest interconnectivity in human history, go back to shipping bytes around via snail mail as a preferred means of data transfer? (Really, just what type of throughput does the USPS offer?) Jim Gray would do it, that's who. And we're not just talking about Zip disks, no sir; we're talking about shipping entire hard drives, or even complete computer systems, packed full of disks.
Gray, head of Microsoft's Bay Area Research Center, sits down with Queue and tells us what type of a voracious appetite for data could require such extreme measures. A recent winner of the ACM Turing Award, Gray is a giant in the world of database and transaction-processing computer systems. Before Microsoft, he worked at a few companies you might know: Digital, Tandem, IBM, and AT&T. He's also a member of the Queue Editorial Advisory Board.
Shooting questions at Gray on such topics as open-source databases and smart disks is David Patterson, who holds the Pardee Chair of Computer Science at the University of California at Berkeley. Patterson headed up the design and implementation of RISC I, which laid the foundations for Sun's SPARC architecture. Along with Randy Katz, Patterson also helped pioneer redundant arrays of independent disks--yes, RAID.
DAVE PATTERSON What is the state of storage today?
JIM GRAY We have an embarrassment of riches in that we're able to store more than we can access. Capacities continue to double each year, while access times are improving at 10 percent per year. So, we have a vastly larger storage pool, with a relatively narrow pipeline into it.
We're not really geared for this. Having lots of RAM helps. We can cache a lot in main memory and reduce secondary storage access. But the fundamental problem is that we are building a larger reservoir with more or less the same diameter pipe coming out of the reservoir. We have a much harder time accessing things inside the reservoir.
DP How big were storage systems when you got started?
JG Twenty-megabyte disks were considered giant. I believe that the first time I asked anybody, about 1970, disk storage rented for a dollar per megabyte a month. IBM leased rather than sold storage at the time. Each disk was the size of a washing machine and cost around $20,000.
Much of our energy in those days went into optimizing access. It's difficult for people today to appreciate that, especially when they hold one of these $100 disks in their hand that has 10,000 times more capacity and is 100 times cheaper than the disks of 30 years ago.
DP How did we end up with wretched excess of capacity versus access?
JG First, people in the laboratory have been improving density. From about 1960 to 1990, the magnetic material density improved at something like 35 percent per year--a little slower than Moore's Law. In fact, there was a lot of discussion that RAM megabyte per dollar would surpass disks because RAM was following Moore's Law and disks were evolving much more slowly.
But starting about 1989, disk densities began to double each year. Rather than going slower than Moore's Law, they grew faster. Moore's Law is something like 60 percent a year, and disk densities improved 100 percent per year.
Today disk-capacity growth continues at this blistering rate, maybe a little slower. But disk access, which is to say, "Move the disk arm to the right cylinder and rotate the disk to the right block," has improved about tenfold. The rotation speed has gone up from 3,000 to 15,000 RPM, and the access times have gone from 50 milliseconds down to 5 milliseconds. That's a factor of 10. Bandwidth has improved about 40-fold, from 1 megabyte per second to 40 megabytes per second. Access times are improving about 7 to 10 percent per year. Meanwhile, densities have been improvi