Slashdot Mirror


Turing Award Winner On The Future of Storage

weileong writes "Ars Technica highlights an interview at ACM Queue with Jim Gray, a winner of the ACM Turing award *(among other things) by one of the pioneers of RAID (among other things). Many issues touched upon, including: "programmers have to start thinking of the disk as a sequential device rather than a random access device." "So disks are not random access any more?" "That's one of the things that more or less everybody is gravitating toward. The idea of a log-structured file system is much more attractive. There are many other architectural changes that we'll have to consider in disks with huge capacity and limited bandwidth." Actual interview has MUCH detail, definitely worth reading."

49 of 227 comments (clear)

  1. dupe by Anonymous Coward · · Score: 5, Informative
  2. This sounds familar.. by grasshoppa · · Score: 2, Informative

    ...does anybody else think this sounds familar?

    I must have read an article earlier about this same thing, probably by this same guy. Can anybody confirm that?

    --
    Mod me down with all of your hatred and your journey towards the dark side will be complete!
  3. Solid state is the way to go. by caluml · · Score: 4, Interesting
    "programmers have to start thinking of the disk as a sequential device rather than a random access device."

    I think we'd all be better off when solid state, non-mechanical disks become commonplace.

    Is there any reason other than cost why we can't have 100Gb solid-state drives yet?

    1. Re:Solid state is the way to go. by stratjakt · · Score: 3, Funny

      Is cost not a good enough reason for you?

      HDD = a buck a gig, solid-state = 100 bucks a gig.

      Though supposedly magical MRAM will come along and revolutionize the world. OLED screens too. And oh yeah, Duke Nuk'Em Forever.

      --
      I don't need no instructions to know how to rock!!!!
    2. Re:Solid state is the way to go. by grub · · Score: 5, Informative


      I think we'd all be better off when solid state, non-mechanical disks become commonplace.

      A company named SolidData sells solid state "drives".

      --
      Trolling is a art,
    3. Re:Solid state is the way to go. by ananiasanom · · Score: 3, Informative
      The point is not that solid-state will not get bigger and cheaper (it will), but that disk is getting bigger and cheaper faster.

      So sure, you could replace your current 80Gb disk drive with 80Gb of solid state, but where are you going to store your 50Gb 3D movies in 1000x1000x1000 resolution? They're going to be on disk, and you'll have to deal with the increasing size:bandwidth and size:access-speed ratios. After all, I can buy a smartmedia card with the capacity of my first hard drive for about what I used to pay for a box of floppies, but I still use a hard disk.

      Secondly, as others have pointed out, just as the article describes future disk behaving more like tape, future solid-state memory may behave more like disk. Where is it now? chips can pump out sequential data at close to 1 gigabit, but jumping about in memory is much slower (any expert got figures?).

  4. Next on Slashdot by Do+not+eat · · Score: 4, Funny

    This week: You can make a trade-off between latency and throughput!
    Next week: Cars that can haul less can be more fuel-effiecent!
    The week after: Algorithms that use more memory, but are faster to execute!

  5. Huge disks by heironymouscoward · · Score: 5, Insightful

    If I look at the trends of the last decades, while disk sizes increase exponentially, the actual number of top-level objects I store on my systems increases only linearly, and quite slowly. True, I still store individual documents, but I also store AVIs, ISOs, entire photo albums that take gigabytes each.

    It's still random access: I can choose and access an object, even individual photos, without scanning through large amounts of unwanted data.

    --
    Ceci n'est pas une signature
    1. Re:Huge disks by Llurien · · Score: 2, Interesting

      Interesting point. I guess that's partly because a human collects stuff in a more or less linear fashion. Everything you collect, create or use takes time, and time is a resource that we don't get more of simply because our computers get faster. It is possible to handle one single 4 GB file such as a movie, but it would be impossible to do something meaningfull with 4000 1MB files, it would simply take too much time. Offcourse, you could think of automated tasks operating on large sets of files, but again random access would serve no benefit here. Throughput is important in the case of a program handling a sequence of small files.

  6. Bandwidth... by Ratface · · Score: 5, Funny

    I love his commenta about mailing disks to Europe and Asia..

    The biggest problem I have mailing disks is customs. If you mail a disk to Europe or Asia, you have to pay customs, which about doubles the shipping cost and introduces delays.

    Thereby adding a corrolary to the old adage "Never underestimate the bandwidth of a vanload of tapes barrelling down the highway"...

    "Never underestimate the bottleneck caused by a far-Eastern customs inspector." .-D

    --

    A little planning goes a long way...
  7. Let me just read your mind... by WIAKywbfatw · · Score: 5, Funny

    ...does anybody else think this sounds familar?

    I must have read an article earlier about this same thing, probably by this same guy. Can anybody confirm that?


    Thanks to my well-developed powers of telepathy, I can tell you that you have read a previous article on the topic by the same author. So I'm happy to confirm that for you.

    I can also tell you, thanks to my equally well-honed powers of clairvoyance, that this post will soon be modded up as funny.

    (Sheesh. And I thought that some recent "Ask Slashdot" questions were dumb.)

    --

    "Accept that some days you are the pigeon, and some days you are the statue." - David Brent, Wernham Hogg
  8. Very much a pioneer, even IF he works for MS by Anonymous Coward · · Score: 4, Informative

    Check out Jim Grey's info page on Microsoft Research He's done research on many diverse and interesting technologies such as distributed computing and sequential I/O performance. There are some nifty sites he has taken part in creating, such as a browsable photo of Earth, and a map of the Universe

  9. Network speed by CausticWindow · · Score: 3, Interesting

    they are part of Internet 2, Virtual Business Networks (VBNs), and the Next Generation Internet (NGI). Even so, it takes them a long time to copy a gigabyte. Copy a terabyte? It takes them a very, very long time across the networks they have

    Is this really true? Wasn't there a recent Slashdot story where researchers transfered a gigabyte of data, in fourteen seconds or so, on Internet 2 from California to the Netherlands?

    I suppose that disk access times will be limiting factor in both ends if you were to read and write the data from/to a disk.

    --
    How small a thought it takes to fill a whole life
    1. Re:Network speed by CausticWindow · · Score: 3, Informative

      Couldn't find the article with the Slashdot search, but Google produced it. Here it is.

      The real numbers were 8,609 Mbps, which translates roughly into a DVD transfered every five seconds. Btw., it was Switzerland, not the Netherlands.

      Also, I don't understand the part where he mentions bandwidth costs of $1 per gigabyte. Maybe you have to pay that much on the Internet 2, but my DSL costs is somewhere in the region of $0.05 per gigabyte, i figure. Maybe I'm just spoilt.

      --
      How small a thought it takes to fill a whole life
  10. Ouch... by cybermace5 · · Score: 4, Insightful

    Frankly the interview was painful every time Dave Patterson said something. How many times does he have to ask questions about the concept of mailing a computer? "We mail computers because transferring over the Internet is too slow for these massive data transfers." "Are they computers?" "Yes." "Do you mail them?" "Yes." "It's like a movie." "Uhh ok." "Is it a whole computer that you mail?" "Yes, it is a computer full of hard drives." "Why don't you just use the Internet?" "Because it is too slow."

    --
    ...
  11. pr0n by leomekenkamp · · Score: 4, Funny

    We have a dozen doing TeraServer work; we have about eight in our lab for video archives, backups, and so on.

    That's a good excuse to use on my wife: "No honey, those are my ..., uhhm..., video archives."

    --
    Wenn ist das Nunstueck git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput.
  12. ACM Turing Award Winner by m1kesm1th · · Score: 5, Funny

    Does that mean he managed to convince someone he was a computer?

    1. Re:ACM Turing Award Winner by jc42 · · Score: 3, Funny

      Does that mean he managed to convince someone he was a computer?

      My wife likes to tell people that her first job, back in the late 70's, was with a Civil Engineering firm in New York, where her job title was "Computer". She did the calculations (and error checking ;-) for their engineering drawings. She used machines to do this, of course, but those machines were called "calculators".

      They've since changed the job title.

      Funny how quickly such terminology can change.

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
  13. Tweaked by Flamesplash · · Score: 4, Informative

    My prof talked about this in my networking class. Apparantly they tweaked the hell out of the data link layer to do this, so it was not a generic data transfer at all.

    --
    "Not knowing when the dawn will come, I open every door." - Emily Dickinson
  14. The van metaphor by ArmenTanzarian · · Score: 2, Funny

    I've seen this a couple times before, but Google seems to come up with nothing useful for it. It doesn't help that every crappy musician who has made a tape sells it out of their crappy van or that so many scientist have the old prussion "van der something" in their names. But perhaps it's crappy musicicans and these van der scientists who really control the highspeed data transfer.

  15. 2 quotes... by leomekenkamp · · Score: 2, Interesting

    Two quotes from the article (emphasis mine):

    Gray, head of Microsoft's Bay Area Research Center, sits down with Queue and tells us (...)

    JG: If it is business as usual, then a petabyte store needs 1,000 storage admins. Our chore is to figure out how to waste storage space to save administration.

    MS bashers will have a field day on this one...

    --
    Wenn ist das Nunstueck git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput.
  16. Troll in the article by panurge · · Score: 4, Insightful
    ...semi-seriously. Look at all the stuff about MySQL and Linux in the middle. It's as if a Microsoft Marketoid had suddenly taken over the interview. Or someone who didn't understand the difference between many thousands of developers working on Linux and the smaller number that work on MySQL.

    Apart from speculating as to whether this attempt at FUD was the real payload of the article, did it really say anything that most of us haven't already noticed? Whether Flash or fast SCSI, we could do with an intermediate layer of backing store, with faster random access than current IDE HDDs. And we are fast heading for removable IDE drives to be a better and cheaper tape replacement. And the Internet has limited bandwidth. I'm sorry, but you don't need a Turing prize to work any of that out.

    --
    Panurge has posted for the last time. Thanks for the positive moderations.
    1. Re:Troll in the article by panurge · · Score: 2, Interesting
      I'm going to confess that I have probably misunderstood the point. The precise bit of the article I was referring to was:

      The challenge is similar to the challenge we see in the OS space. My buddies are being killed by supporting all the Linux variants. It is hard to build a product on top of Linux because every other user compiles his own kernel and there are many different species. The main hope for Oracle, DB2, and SQLserver is that the open-source community will continue to fragment. Human nature being what it is, I think Oracle is safe.

      DP Is MySQL.com trying to be the Red Hat of MySQL?

      JG It could be that they will step forward and provide all of those things that IBM, Microsoft, and Oracle provided, and do it for a much lower price. I think the incumbent vendors will have to be innovative to make their products more attractive.

      One thing that works in the incumbents' favor is fear, uncertainty, and doubt (FUD). If you base your company on a database, you are risking a lot. You want to buy the best one. People are usually pretty cautious about where they want to put their data. They want to know that it's going to have a disaster recovery plan, replication, good code quality, and in particular, lots and lots and lots of testing.

      The thing that slows Oracle, IBM, and Microsoft down is the testing, and making sure they don't break anything--supporting the legacy. I don't know if the MySQL community has the same focus on that.

      At some point, somebody will say, "I'm running my company on MySQL." Indeed, I wish I could hear Scott McNealy [CEO of Sun Microsystems] tell that to Larry Ellison [CEO of Oracle].

      DP The whole corporation?

      JG Right. Larry Ellison announced that Oracle is now running entirely on Linux. But he didn't say, "Incidentally we're going to run all of Oracle on MySQL on Linux." If you just connected the dots, that would be the next sentence in the paragraph. But he didn't say that, so I believe that Larry actually thinks Oracle will have a lot more value than MySQL has. I do not understand why he thinks the Linux problems are fixable and the MySQL problems are not.

      I was concentrating on his claims that building a system on top of Linux is particularly hard, and his mentioning Microsoft in the same sentence as IBM and Oracle. Although I make extensive use of MySQL for small systems in our consultancy, I think it is a long way from being ready for the main enterprise RDBMS. In fact, I felt he was trying to tar Linux with the MySQL brush, if you see what I mean.

      I now think he probably did not mean it the way I read it. If anyone cares to mod my original post down, feel free. But I do think that, for a long article, there was actually not a lot of real content.

      --
      Panurge has posted for the last time. Thanks for the positive moderations.
    2. Re:Troll in the article by leandrod · · Score: 2, Insightful

      > Look at all the stuff about MySQL and Linux in the middle. It's as if a Microsoft Marketoid had suddenly taken over the interview. Or someone who didn't understand the difference between many thousands of developers working on Linux and the smaller number that work on MySQL.

      He's correct as far as he goes.

      MySQL and MS SQL Server actually have the same problem, and it is called SQL; both even go downhill from there.

      SQL is simply too complex to implement properly, and it only gets worse when you start with a non-standard implementation. While MySQL benefits from a better OS to run on, it has the more fundamental flaws that its developers don't really understand data in general and the relational model in particular, and it has started with something that wasn't really SQL at all and not a DBMS at all; MS began with something that was a real if weak DBMS, and almost SQL already, and since has hired some pretty good guys and improved impressively.

      Eventually MySQL will reach maturity, and with more ports, a more variated and complicated legacy and less understanding, it will have a rougher time developing the future and supporting the past. See that MySQL's current idea of future is SAPdb, which is stuck with Oracle v7 feature parity and less-than-SQL 92 Entry Level compliance.

      Obviously PostgreSQL is a better base to build on, it originally was even better than SQL (Ingres QUEL was based on Codd's own Alpha), since it got into the SQL cult it was never as unfaithful as MySQL is now or even SQL Server was, and PostgreSQL was always a real DBMS. Too bad the exposition it gets is so small Mr Gray can't even spell its name or see its superiority.

      --
      Leandro Guimarães Faria Corcete DUTRA
      DA, DBA, SysAdmin, Data Modeller
      GNU Project, Debian GNU/Lin
  17. LSFS by smd4985 · · Score: 3, Informative

    For more info on (very-cool) Log-Structed File Systems, check out Mendel's original paper at:

    http://citeseer.nj.nec.com/rosenblum91design.htm l

    --
    smd4985
  18. The hierarchical object file system by master_p · · Score: 3, Insightful

    One final thing that is even more speculative is what my co-workers at Microsoft are doing. They are replacing the file system with an object store, and using schematized storage to organize information. Gordon Bell calls that project MyLifeBits. It is speculative--a shot at implementing Vannevar Bush's memex [http://www.theatlantic.com/unbound/flashbks/compu ter/bushf.htm]. If they pull it off, it will be a revolution in the way we use storage

    I've talked about it before. This guy thinks what Microsoft is doing is revolutionary. Come on all you people, can't you see the problem with today's file systems ? the problem is that the type information is lost!!! we need objects, and we need type information to be stored along those objects!!! This is the only way lots of problems will go away.

  19. MRAM saves the day by Markus+Registrada · · Score: 3, Interesting
    All the tradeoffs will change radically when MRAM hits the streets. It's potentially denser than disk and DRAM, as fast as static RAM, nonvolatile, doesn't use power when it's not used, and can be made on regular silicon process machinery. Expect it first in cell phones next year, and then everywhere.

    This doesn't just affect file storage and virtual memory. It also changes the economics of cache and main memory, and makes deployment of 64-bit CPUs more urgent. It also makes system crashes much less tolerable, because turning the computer off and on doesn't involve long shutdown and boot procedures any more.

    1. Re:MRAM saves the day by HerbieStone · · Score: 2, Funny
      All the tradeoffs will change radically when MRAM hits the streets. It's potentially denser than disk and DRAM, as fast as static RAM, ...

      Yup. And Duke-Nukem Forever will eat Half-Life 2s panties.

  20. Fuzzy numbers, or can this be right? by abulafia · · Score: 3, Funny
    JG Twenty-megabyte disks were considered giant. I believe that the first time I asked anybody, about 1970, disk storage rented for a dollar per megabyte a month. IBM leased rather than sold storage at the time. Each disk was the size of a washing machine and cost around $20,000.

    So, one could rent a $20K device for $240/year? Those must have been the days...

    That can't be right.

    --
    I forget what 8 was for.
  21. Defending Jim Gray by chrisd · · Score: 3, Insightful
    I didn't really read that as fud or even invalid criticism of MySQL. Maybe I'm biased because of my previous work with Queue and since I have met Jim, but if you get the impression that Jim doesn't like MySQL (which I did not) then I would actually assume it is because he felt that way, not because of Microsoft. Jim is one of those guys that will never be looking for a job, his early work on databases were pivotable to the development of transactions and his work on fault tolerant systems is legendary, he really is beyond reproach.

    Chrisd

    --
    Co-Editor, Open Sources
    Open Source Program Manager, Google, Inc.
  22. Three letters: F, U, and D by heironymouscoward · · Score: 4, Insightful

    Take this choice quote from the article:

    My buddies are being killed by supporting all the Linux variants. It is hard to build a product on top of Linux because every other user compiles his own kernel and there are many different species.

    Ain't it sweet? I count five lies:

    (1) people being killed by supporting (gasp) operating systems... gosh, horror and violence, not nice at all!

    (2) all the Linux "variants", are in fact pretty much one standard, LSB, with several skins

    (3) "hard to build a product on top of Linux", rather than, hmmm, Windows? Linux is incredibly easy to build for. I suspect the fact that it's very standard helps.

    (4) "every other user compiles his kernel"... maybe at Microsoft. I suspect less than 1 in 20 Linux users ever compiled a kernel.

    (5) compiling a kernel means you can't support it... WTF? The kernel is incredibly stable, since most changes are in external modules. And I can't remember a single case where a kernel change broke one of my apps.

    (6) (sorry, I was not counting well), "many different species"... well, AFAICS the only difference between the Linux distributions is that they have different packaging methods, different timelines as to their versions, and different UI tools for hardware detection, configuration, etc. Nothing at all that makes life hard.

    Look: I just installed Xandros, which is Debian with a nice face. On two different types of machine, and it installed without asking a single question about my hardware except whether the mouse was left or right-handed. Check my journal...

    Windows never worked this nicely. Where is the support issue?

    In the writing indistry we call this "to condemn with faint praise".

    Yeah, Windows kinda works, I mean, it'll run Office without crashing too often, but it's just killing by buddies to have to maintain Win2K, WinXP, and even some older Win98 machines, not to mention we have a whole cupboard simply filled with driver CDs for every PC we have.

    --
    Ceci n'est pas une signature
    1. Re:Three letters: F, U, and D by Junks+Jerzey · · Score: 2, Insightful

      From the Devil's Dictionary:

      FUD: The sound made by someone attempting to wish away inconvenient facts.

      http://www.eod.com/devil/archive/fud.html

  23. 3 Terrabytes on a credit card? by polyp2000 · · Score: 4, Interesting

    Anyone know what happened to that bloke at keele who
    invented a way of cramming 3 Terrabytes on a credit card. Apparently it would have cost about 35 pounds to manufacture. this was a couple of years ago, why hasnt it happened yet?

    Surely something like this is the real future of storage ?

    Terrabyte on a credit card

    --
    Electronic Music Made Using Linux http://soundcloud.com/polyp
    1. Re:3 Terrabytes on a credit card? by jetkust · · Score: 2, Insightful

      This article claims some kind of software based 8:1 compression scheme on binary data. Am i reading this wrong or does this seem a bit like nonsense?

  24. Sneaker net? by computerlady · · Score: 3, Funny

    "Sneaker net" was when you used your sneakers to transport data?

    Oh my. How old I feel when someone has to ask what "sneaker net" was. And someone has to answer...

    --
    computerlady - a brand new Slash-daughter - alone, but no longer invisible, in the /. world
  25. And an old one! by siskbc · · Score: 2, Funny

    Damn, timothy, when it says June on the article it just might be a dupe, ya know? But it's nice to know that the future of disk access hasn't changed since then.

    --

    -Looking for a job as a materials chemist or multivariat

  26. AMAZING!!! by X86Daddy · · Score: 4, Funny

    This is a *MAJOR* breakthrough! Most Turing Test contestants don't even win, but this one can eloquently discuss topics and give complex answers, rather than just turning back the question, Eliza-style.

    Can we download a copy of this "Jim Gray" yet?

    1. Re:AMAZING!!! by cybermace5 · · Score: 2, Funny

      Can we download a copy of this "Jim Gray" yet?

      No, too big to transfer over the Internet at this point. You'll have to use UPS.

      --
      ...
  27. New File System by Archangel+Michael · · Score: 2, Interesting

    What current file systems need is meta data in them. That is that the File system itself stores the MetaData about the file. Think about the Mac File system, with the Meta data contained in the file itself, as the "resource fork". Now imagine a systemized, extensable meta file system, that organized files by what the Meta Data said about them.

    Imagine, media files stored in such a way that both random and sequential access was optimized, where the file structure was automagically defragmented and organized behind the scenes.

    Imagine a computer that watched what files were used at bootup, and organized them so that the hard drive streamed the bootup data sequentially, straight into memory.

    Imagine being able to start PRELOADING applications before you even finish the second of your double clicks on the datafile.

    Imagine Database files that were automagically indexed as part of the file system.

    Imagine Security and encryption being built into the filesystem beyond today's capabilities, where the security and encryption does not rely upon a master controller or centralized security policies, but rather has the ability to follow the file, seemlessly.

    I am sure that I haven't even begun to tap the possibilities.

    --
    Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
    1. Re:New File System by Archangel+Michael · · Score: 2, Interesting

      That is just a start. That article mentions nothing about Meta Data, which is required to make advanced capabilities of the File System come alive.

      For Meta Data to work, there has to be some sort of STANDARDS based way of describing said data.

      For instance, a table. How would you describe a table? Is it Tab delimited text, Spreadsheet or a HTML based Table? Does it reference cells and or other tables? Are those available? Is the data from missing tables, available as a static value?

      Is the data within the table used in other work, such as a presenation or Brochure? The value is not in the system, but in the interlocking way we use it, and that needs to be described as Meta Data.

      --
      Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
  28. IDE replaces DVD by G4from128k · · Score: 4, Interesting

    With an ever growing collection of digital photos, I've come to the same conclusion as Jim Gray. Hard disks are superior for backups.

    I currently have about 100 GB of images and it takes more than 20 4.7 GB DVD-R discs to create a full backup. Although DVD media is still slightly cheaper than new large capacity IDE drives, the added time and hassle factor of burning 20 disks far out weighs any minor costs savings. Moreover a 3.5" drive in a padded anti-static bag takes up less room in the safe deposit box than 20 DVDs (especially if you have the DVDs in protective jewel cases). And if HD-based-backup lets me avoid some future artists tax on burnable media, so much the better.

    A Firewire enclosure and a rotating collection of IDE drives is the way to go.

    --
    Two wrongs don't make a right, but three lefts do.
  29. Interesting Idea... by polyp2000 · · Score: 3, Interesting

    Interesting thought popped when i read your post,
    there is a current trend towards cramming as much storage into something the size of a 3in Hard drive.

    I wonder why they dont make larger harddrives in the physical sense? A hard drive the size of a washing machine using todays technology would store a phenomenal amount of stuff, but whatabout something more reasonable like a hard drive merely twice the physical size of todays. how much more storage could you get just by scaling up the platters? anyone here good at math . Hard drives today must be up to 200-250gb.

    --
    Electronic Music Made Using Linux http://soundcloud.com/polyp
  30. Re:Wait by Anonymous Coward · · Score: 2, Interesting

    There are multiple levels of access within a file system. The sequential versus random decisions they are talking about is at a much lower level than you are thinking. Somewhat simplified:

    Now, when software opens a file, it gets a handle to the storage and seeks all over it to get the data it needs and finally write it back. This is particularly true of files that consist of many records. Some software mmaps (memory maps) the file, mapping it into the memory address space and making it appear as a large, slow section of RAM in order to make this easier.

    Relatively recently, you see many more programs which open a file, slurp the entire thing into memory, and close the file on disk. When they want to make changes, they open the file again and rewrite it from scratch. You see this more in text editors and word processors. Programming editors will often have some alternate behavior for very large files, although the threshhold for "very large file" is always increasing.

    When you do this with record oriented files and or incremental save/autosave, etc, you get into journalling. You write all of the user's changes sequentially to a log file rather than saving the actual file (and re-writing it) repeatedly. This is sometimes what you are seeing when a program has a 'recovery file'. Having only one recovery file or journal for any number of open files means you are consistently writing appends to a single location and avoiding disk seeks.

    What the article is getting at is that this sort of behavior will get more and more common, even moving into the FS and OS level. Support for this kind of journalling may move its way into FS handling, for instance. Also, instead of opening individual files, the FS may block transfer a whole directory into RAM at once. We already see this with advanced file systems which store small files directly in the directory inode. We may see the inodes get larger and the definition of 'small file' become steadily larger. When you have GBs of RAM and TB of storage, why not have a 64 MB+ inode?

    From this point of view, random seeking within files slowly becomes irrelevent. Rather, the primary operations become streaming and append.

  31. He is right, but nothing to do with the kernel by brunes69 · · Score: 4, Insightful

    His basic idea is 100% correct, but the reson is all wrong. It *IS* much harder to develop an app Linux the myriad of flavours, not because of the kernel, but because every distro has its own versions of libraries. I work for a company that makes Linux software, and we only support RedHat, and even certain versions of RedHat at that. While our product would probably compile against any number of distros, and even the BSDs, we just don't have the time and manpower required to build, test, debug, package, and maintain 15 different releases for every sub-release or patchlevel we have in the product. With Windows products, at least, (unless you are doing some lower-level stuff) if you build something you can be reasonably assured it will run on Windows 2000, or Windows XP, or Windows 2003. Not the same if you build something with RedHat 9 and try to run it on Debian or Suse, etc. And before you go on about "release a source package", not all companies release everything GPL, and want to keep their IP theirs, since they like to put some money on the table at night. It's definitly not FUD to say it is much more effort to develop and release cross platform binaries in Linux than Windows.

  32. The Ol' Roadmaster Scenario by codefool · · Score: 2, Informative

    What Gray is talking (mostly) about is what we used to call the "Roadmaster Scenario." When I worked for [a major electronics company], we had a data center in Dallas and a redundant site about 30 miles away in Lewisville. Every Sunday the entire IMS database was archived to mag tape and shipped to the other data center for a second level of redundancy. This begged the question, why not just copy them over the T1 lines (this was 1980) to the other site's tape drives directly? The answer, of course, was that it takes a helluva lot of bandwidth to outrun a Roadmaster full of mag tapes.

    --
    "Stop whining!" - Arnold, as Mr. Kimble
  33. Missing the logical boat by leandrod · · Score: 2, Interesting

    > To some extent you can think of Codd's relational algebra as an algebra of punched cards. Every card is a record. Every machine is an operator.

    Interesting how the guy literally wrote the book on transactions, yet grossly misrepresents Codd's work, which BTW wasn't simply the relational algebra, but even higher level: the relational model of database management, including the relational calculus.

    While the algebra is somewhat procedural, the calculus is set-oriented, and they are fully equivalent. The idea is exactly not looking at records and operators, but describe what you want -- just leave the relational system set the procedures to get that in the most efficient way it can.

    Incidentally this has a big impact on all Gray is discussing -- without a fairly simple and powerful data model, so much data is basically a waste. He's thinking too low level, including the object stuff he touts, but we will only find use for so much data the day we get proper relational implementations, and this excludes SQL in general and MySQL in particular.

    --
    Leandro Guimarães Faria Corcete DUTRA
    DA, DBA, SysAdmin, Data Modeller
    GNU Project, Debian GNU/Lin
    1. Re:Missing the logical boat by tomlord · · Score: 2, Informative

      He isn't grossly misrepresenting Codd's work.

      You said it yourself:

      While the algebra is somewhat procedural, the calculus is set-oriented, and they are fully equivalent.

      and, uncoincidentally, the isomorphism extends further to machines that manipulate physical punch cards. You go on to say:

      The idea is exactly not looking at records and operators, but describe what you want -- just leave the relational system set the procedures to get that in the most efficient way it can.

      Right. And what Gray has pointed out is that Codd's work on the math and how to implement it doesn't really require computers, as such.

      In an alternate timeline, there were no computers just lots of expensive punch-card machines and racks and racks of data stored on punch-cards.
      (Such was the economic value of all this data that the racks of cards were often stored with an almost military degree of jealous protection: the origin of the term "Data Base".)

      Each card machine could perform a simple operation like "duplicate this card stack" or "pull out the cards that have a Q in column 3". The machines could be organized into a sort of assembly line for a particular computation, with technicians looking at a script on a clipboard and carrying trays of cards between machines, configuring each machine with the right parameters, running the cards through, then going to the next step. It was an expensive, labor-intensive process and the ad-hoc procedures used to write the scripts for the technicians were black-magic, often error prone.

      Time-study super-genious, Alternate-Codd, studied the machines and the procedures used to operate them. He realized that they could be described by set math. He realized that if you let the managers define their "Card Searches" in very high-level, very mathy terms -- then there was a straightforward optimization problem to get from that "Search Specification" to set of "compiled instructions" for the technicians. The goals was produce a set of Compiled Instructions that would use the punch card machines in an optimal way -- saving time and money.

      He studied the optimization problem and developed some techniques for it. Companies used his results by highering a "Compiler Pool" -- most often a group of women chosen from the secretarial pool for the accuracy of their work. When a new Card Search request came in, the search would be typed up and mimeographed, and handed to the head of the Compiler Pool. It typically took "the girls" about a day to compile a query but, every time, the scripts they wrote for the technicians produced the right answer, usually much faster than anyone thought possible.

      In one office, though, in Rochester New York, there was a famous accident. The office used by the Compiler Pool had developed a problem with flies. One day, one was swatted and killed with the mimeograph master of a compiled query, leaving a mark that obscured some important numbers. Nobody noticed, the technicians dutifully followed the errant script, and by the next afternoon the company's entire collection of precious data was strune, unsorted, in a huge pile on the machine room floor. The company was bankrupt only 9 months later.

      The company president demanded an explanation when the accident occured and much investigation followed, eventually revealing the fly and its consequences. This was, of course, the origin of the familiar phrase (known to every customer whose ever gotten a $500 bill for a month of telephonic service), "compiler bug".

  34. Re:It's "A station wagon full of..." by AJWM · · Score: 2, Interesting

    Certainly 1980s, probably circa 1983 or 1984 at the latest. I came up with the phrase (which may well have been independently coined before me, at the time I was unaware of it) when we were setting up NETNORTH, the Canadian counterpart to BITNET (networks of typically college campus mainframes, not directly part of ARPANET). There was discussion about setting up the HQ at University of Guelph (where I worked at the time - west of Toronto) or Waterloo University.

    The highway in question (as in station wagon travelling on) was the Highway (7? it's been a long time) between Waterloo and Guelph (at least part of which I drove every day, since I lived in Waterloo). I don't recall the numbers now, but my calculation of the bandwidth of Hwy 7 was based on a couple of boxes of 2400' reels of 6250 BPI tape (standard IBM mainframe tape size) in a car (or station wagon) travelling at the posted 90 km/h speed limit.

    Back in those days, aside from dedicated leased-line networks like BITNET or commercial X.25 packet networks like Tymnet, a 2400 baud dialup modem was considered blazingly fast. (And long distance charges were not cheap, hence the popularity of multi-hop dialup networks using UUCP or like Fidonet.)

    --
    -- Alastair
  35. Here's an idea by shadow_slicer · · Score: 2, Informative

    Why don't you send out a mixed source/binary package:
    The binary part can be the core of your program and contain all your IP.
    The source part can be an interface layer to the rest of the system (aliases for library calls, or equivalent implementations for missing functions, etc...basically a wrapper layer between the system and the program).

    During the installation the source part can be compiled and (statically/dynamically) linked to the binary part. The source package doesn't have to be GPL (since, if it linking it to your binary would force the binary to be GPL), but it could still use some other open source license.
    That way you can mitigate the disadvantages of a binary distribution without having to use a full source distribution.

    Also, if many companies were doing this, it might be a good idea to open source these compatability layers so that every company that makes something for linux isn't duplicating the effort. (though this is kind of what libraries are supposed to do....)

    Another alternative is to *trust* your customers:
    You could have a full source package, but under a proprietary license (not GPL). Just because the source is available doesn't mean that the customers have full reign over your IP, or even are more likely to pirate it: I have the full "source" for several books, but that doesn't cause me to violate the IP of those authors.

    I really doubt that PHB's will go for the full-source approach though, as they tend to be paranoid about such things...which is why I suggested the first thing.......first....