Slashdot Mirror


Turing Award Winner On The Future of Storage

weileong writes "Ars Technica highlights an interview at ACM Queue with Jim Gray, a winner of the ACM Turing award *(among other things) by one of the pioneers of RAID (among other things). Many issues touched upon, including: "programmers have to start thinking of the disk as a sequential device rather than a random access device." "So disks are not random access any more?" "That's one of the things that more or less everybody is gravitating toward. The idea of a log-structured file system is much more attractive. There are many other architectural changes that we'll have to consider in disks with huge capacity and limited bandwidth." Actual interview has MUCH detail, definitely worth reading."

13 of 227 comments (clear)

  1. dupe by Anonymous Coward · · Score: 5, Informative
  2. This sounds familar.. by grasshoppa · · Score: 2, Informative

    ...does anybody else think this sounds familar?

    I must have read an article earlier about this same thing, probably by this same guy. Can anybody confirm that?

    --
    Mod me down with all of your hatred and your journey towards the dark side will be complete!
  3. Re:Solid state is the way to go. by grub · · Score: 5, Informative


    I think we'd all be better off when solid state, non-mechanical disks become commonplace.

    A company named SolidData sells solid state "drives".

    --
    Trolling is a art,
  4. DUPE by Anonymous Coward · · Score: 1, Informative
  5. Very much a pioneer, even IF he works for MS by Anonymous Coward · · Score: 4, Informative

    Check out Jim Grey's info page on Microsoft Research He's done research on many diverse and interesting technologies such as distributed computing and sequential I/O performance. There are some nifty sites he has taken part in creating, such as a browsable photo of Earth, and a map of the Universe

  6. Tweaked by Flamesplash · · Score: 4, Informative

    My prof talked about this in my networking class. Apparantly they tweaked the hell out of the data link layer to do this, so it was not a generic data transfer at all.

    --
    "Not knowing when the dawn will come, I open every door." - Emily Dickinson
  7. Re:Network speed by CausticWindow · · Score: 3, Informative

    Couldn't find the article with the Slashdot search, but Google produced it. Here it is.

    The real numbers were 8,609 Mbps, which translates roughly into a DVD transfered every five seconds. Btw., it was Switzerland, not the Netherlands.

    Also, I don't understand the part where he mentions bandwidth costs of $1 per gigabyte. Maybe you have to pay that much on the Internet 2, but my DSL costs is somewhere in the region of $0.05 per gigabyte, i figure. Maybe I'm just spoilt.

    --
    How small a thought it takes to fill a whole life
  8. LSFS by smd4985 · · Score: 3, Informative

    For more info on (very-cool) Log-Structed File Systems, check out Mendel's original paper at:

    http://citeseer.nj.nec.com/rosenblum91design.htm l

    --
    smd4985
  9. Re:The van metaphor by jpop32 · · Score: 1, Informative

    I've seen this a couple times before, but Google seems to come up with nothing useful for it.

    It's from:

    Andrew S. Tannenbaum. Computer Networks. Prentice Hall, third edition, 1996.

    A de facto bible of computer networks. Had it as a textbook in college. You're bound to run into it if you ever get to formally studying networks.

  10. The Ol' Roadmaster Scenario by codefool · · Score: 2, Informative

    What Gray is talking (mostly) about is what we used to call the "Roadmaster Scenario." When I worked for [a major electronics company], we had a data center in Dallas and a redundant site about 30 miles away in Lewisville. Every Sunday the entire IMS database was archived to mag tape and shipped to the other data center for a second level of redundancy. This begged the question, why not just copy them over the T1 lines (this was 1980) to the other site's tape drives directly? The answer, of course, was that it takes a helluva lot of bandwidth to outrun a Roadmaster full of mag tapes.

    --
    "Stop whining!" - Arnold, as Mr. Kimble
  11. Re:Solid state is the way to go. by ananiasanom · · Score: 3, Informative
    The point is not that solid-state will not get bigger and cheaper (it will), but that disk is getting bigger and cheaper faster.

    So sure, you could replace your current 80Gb disk drive with 80Gb of solid state, but where are you going to store your 50Gb 3D movies in 1000x1000x1000 resolution? They're going to be on disk, and you'll have to deal with the increasing size:bandwidth and size:access-speed ratios. After all, I can buy a smartmedia card with the capacity of my first hard drive for about what I used to pay for a box of floppies, but I still use a hard disk.

    Secondly, as others have pointed out, just as the article describes future disk behaving more like tape, future solid-state memory may behave more like disk. Where is it now? chips can pump out sequential data at close to 1 gigabit, but jumping about in memory is much slower (any expert got figures?).

  12. Re:Missing the logical boat by tomlord · · Score: 2, Informative

    He isn't grossly misrepresenting Codd's work.

    You said it yourself:

    While the algebra is somewhat procedural, the calculus is set-oriented, and they are fully equivalent.

    and, uncoincidentally, the isomorphism extends further to machines that manipulate physical punch cards. You go on to say:

    The idea is exactly not looking at records and operators, but describe what you want -- just leave the relational system set the procedures to get that in the most efficient way it can.

    Right. And what Gray has pointed out is that Codd's work on the math and how to implement it doesn't really require computers, as such.

    In an alternate timeline, there were no computers just lots of expensive punch-card machines and racks and racks of data stored on punch-cards.
    (Such was the economic value of all this data that the racks of cards were often stored with an almost military degree of jealous protection: the origin of the term "Data Base".)

    Each card machine could perform a simple operation like "duplicate this card stack" or "pull out the cards that have a Q in column 3". The machines could be organized into a sort of assembly line for a particular computation, with technicians looking at a script on a clipboard and carrying trays of cards between machines, configuring each machine with the right parameters, running the cards through, then going to the next step. It was an expensive, labor-intensive process and the ad-hoc procedures used to write the scripts for the technicians were black-magic, often error prone.

    Time-study super-genious, Alternate-Codd, studied the machines and the procedures used to operate them. He realized that they could be described by set math. He realized that if you let the managers define their "Card Searches" in very high-level, very mathy terms -- then there was a straightforward optimization problem to get from that "Search Specification" to set of "compiled instructions" for the technicians. The goals was produce a set of Compiled Instructions that would use the punch card machines in an optimal way -- saving time and money.

    He studied the optimization problem and developed some techniques for it. Companies used his results by highering a "Compiler Pool" -- most often a group of women chosen from the secretarial pool for the accuracy of their work. When a new Card Search request came in, the search would be typed up and mimeographed, and handed to the head of the Compiler Pool. It typically took "the girls" about a day to compile a query but, every time, the scripts they wrote for the technicians produced the right answer, usually much faster than anyone thought possible.

    In one office, though, in Rochester New York, there was a famous accident. The office used by the Compiler Pool had developed a problem with flies. One day, one was swatted and killed with the mimeograph master of a compiled query, leaving a mark that obscured some important numbers. Nobody noticed, the technicians dutifully followed the errant script, and by the next afternoon the company's entire collection of precious data was strune, unsorted, in a huge pile on the machine room floor. The company was bankrupt only 9 months later.

    The company president demanded an explanation when the accident occured and much investigation followed, eventually revealing the fly and its consequences. This was, of course, the origin of the familiar phrase (known to every customer whose ever gotten a $500 bill for a month of telephonic service), "compiler bug".

  13. Here's an idea by shadow_slicer · · Score: 2, Informative

    Why don't you send out a mixed source/binary package:
    The binary part can be the core of your program and contain all your IP.
    The source part can be an interface layer to the rest of the system (aliases for library calls, or equivalent implementations for missing functions, etc...basically a wrapper layer between the system and the program).

    During the installation the source part can be compiled and (statically/dynamically) linked to the binary part. The source package doesn't have to be GPL (since, if it linking it to your binary would force the binary to be GPL), but it could still use some other open source license.
    That way you can mitigate the disadvantages of a binary distribution without having to use a full source distribution.

    Also, if many companies were doing this, it might be a good idea to open source these compatability layers so that every company that makes something for linux isn't duplicating the effort. (though this is kind of what libraries are supposed to do....)

    Another alternative is to *trust* your customers:
    You could have a full source package, but under a proprietary license (not GPL). Just because the source is available doesn't mean that the customers have full reign over your IP, or even are more likely to pirate it: I have the full "source" for several books, but that doesn't cause me to violate the IP of those authors.

    I really doubt that PHB's will go for the full-source approach though, as they tend to be paranoid about such things...which is why I suggested the first thing.......first....