Slashdot Mirror


Conquest FS: "The Disk Is Dead"

andfarm writes "A few days ago, I sat in at a presentation of a what seems to be a new file system concept: Conquest. Apparently they've developed a FS that stores all the metadata and a lot of the small files in battery-backed RAM. (No, not flash-RAM. That'd be stupid.) According to benchmarks, it's almost as fast as ramfs. Impressive." The page linked above is actually more of a summary page - there's some good .ps research reports in there.

21 of 306 comments (clear)

  1. Drawback by ifreakshow · · Score: 5, Interesting

    One quick draw back I see in this system is on a computer where you have more small files than available RAM space. How does the system decide which small files to keep on the regular disk and which ones to keep in RAM?

    1. Re:Drawback by ottffssent · · Score: 5, Interesting

      LRU eviction is somewhat costly, but highly effective. Pseudo-LRU can be much cheaper and nearly as effective. The replacement policy is not hard - it is a well-researched problem in cache design.

      What I find telling is that such a system has to be implemented at all. It seems clear to me that the operating system's filesystem, in conjunction with the VM, should implement this automatically. In Linux, this is true - large portions of the filesystem get cached if you have gobs of RAM lying around. Why certain more commonly-used OSes do the exact opposite is beyond me.

      From my perspective, the right way to handle this is obvious. RAM is there to be used. Just as we have multiprogramming to make more efficient use of CPU and disk resources, we should be making the best possible use of available RAM. Letting it sit idle on the odd chance the user will suddenly need hundreds of meg of RAM out of nowhere is rediculous. From the perspective of the CPU, RAM is dog slow, but from the perspective of the disk, it's blazing fast. ANYTHING that can be done to shift the burden from magnetic storage to RAM should be done. Magnetic storage excels in one area and one area only: cheap permanent storage of vast amounts of data. RAM should be used to cache oft-used data. Why is this not painfully obvious to anyone designing an operating system?

  2. Who are they kidding? by asdfasdfasdfasdf · · Score: 4, Insightful

    The idea of RAM as storage is great and all, but can we work towards the elimination of STORAGE as RAM before we get to RAM as storage?

    I mean, why *DO* we still have pagefiles?

    A MS Gripe: I seriously don't understand why I can't turn it off completely. With multiple GB of RAM dirt cheap, writing to a disk pagefile slows my system down-- It has to!

    1. Re:Who are they kidding? by DigitalGlass · · Score: 5, Informative

      what version of windows are you running? I have had no problem with turning off the pagefile in 2000 and xp, my machines have 1gb in them and they cranked when i disabled the pagefile.

      should be in control panel - system - advanced - performance --- look in there for something to set the page file to 0 or to disable it.

    2. Re:Who are they kidding? by pclminion · · Score: 4, Interesting
      I mean, why *DO* we still have pagefiles?

      Well, a couple of reasons. Most important, the "pagefile" is there to protect against a hard out-of-memory condition. Modern operating systems are in the habit of overcommitting memory, which means they grant allocation requests even if the available RAM can't fulfill them. The idea is that an app will never actually be using all those pages simultaneously. If things go wrong and all that extra memory is actually needed, the system starts kicking pages to disk to satisfy the cascade of page faults. This means the system will become slow and unresponsive, but it will keep running. But say you didn't have anywhere to swap to. The system can't map a page when a process faults on it, and the process gets killed. But which process gets killed? After all, is it the process's fault if the OS decided to overcommit system memory? The swap space serves as a buffer so a real administrator with human intelligence can come in and kill off the right processes to get the system back in shape.

      Swap is also important because not all data can just be reloaded from the filesystem on demand. Working data built in a process's memory is dynamic and can't just be "reloaded." If there's no swap, that means this memory must be locked in RAM, even if the process in question has been sleeping for days! We all know the benefits of disk caching on performance. Process data pages are higher priority than cache pages. Thus if old, inactive data pages are wasting space in RAM, those are pages that could have been used to provide a larger disk cache.

      You basically always want swap.

  3. The next boost will be by scorp1us · · Score: 5, Interesting

    Execute in Place (EIP)- currently, your system will copy the program to RAM. Here, you'd copy everything from volatile ram to Non-volatile ram - a rather wasteful operation don't you think?

    This is not just for exe's but for datafiles as well...

    --
    Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
  4. Old news. by Nathan+Ramella · · Score: 5, Informative
    These guys have already done it..

    http://www.superssd.com/products/tera-ramsan/

    Up to a terabyte even.

    -n

    --
    http://www.remix.net/
  5. Yeah wutever by Apparition-X · · Score: 5, Funny

    God is dead. (Neitzche) Tape is dead. (Innumerable pundits) Disk is dead. (Conquest FS) Yet somehow, they all seem to be alive and kicking as of right now. I wouldnt be throwing a wake just yet for any of 'em.

  6. Could be accomplished with a "preferential cache?" by mcgroarty · · Score: 4, Interesting

    I wonder if a kernel could realize many of the same performance benefits with current filesystems by identifying directory inodes and small file inodes and lowering the probability of those falling out when it's time to free pages.

  7. Re:well and good by robslimo · · Score: 5, Insightful

    I've predicted and eagerly anticipated the demise (by replacement) of spinning media (magnetic and optical) for 10 or more years now... I've predicted it will happen, not when.

    As this new filesystem implicitly admits, the price/MB is still so much dramatically lower for HDD's than solid state memory, it will still take quite a will for this replacement to happen.

    I disagree that some small killer app must come along to make this happen. Yes, solid state media is coming down in cost and increasing in density, but both need to change by 2 or 3 orders of magnitude before the HDD is dead. What we're waiting for here is the classis convergence of technology and its applications... the apps won't some until the technology can support it and the tech is driven by our demand for it. Expect another 10 years at least.

  8. Page files considered good by Merlin42 · · Score: 4, Informative

    Even with tons of RAM pagefiles are a GOOD thing and if used properly (Even MS uses them properly these days) they speed up the system on the whole.I have done a *little* OS development so I may not be an expert but I do have an idea what I 'm talking about.

    The swapfile is where the OS puts things it hasn't used in a while. On windows this would probably include things such as the portions of IE that are now part of the OS and you are forced to have loaded even if you are not using the box for web browsing. Having placed these items in the page file frees up room for things that are currently usefull such as IO buffers/cache (disk and/or net) that can dramatically increase speed by storing things such as recently used executables, meta-information .... wait this sounds familiar ;)

    That being said I think the technology discussed in this article is a bit too single minded. I think adding an extra level in the storage heirarchy between main ram and non-volitile HD is probably a good thing. My idea is to add a HUGE pile of PC100 or similar ram into a system and have this RAM accessed in a NUMA style which is becoming very popular. The nintendo GameCube uses a form of this aproach, there are two types of RAM with a smaller-faster section and a larger-slower section.

    The problem with my idea is that the price difference b/w cheap-slow RAM and fast-expensive RAM is not enough to make it worth the extra complexity currently. But, I would guess that if someone took the effort to design/build cheap slow RAM they could find a niche market for a system accelerator device ... but then again I could just be not well enough informed (a little knowledge is dangerous ;) and rambling like an idiot.

  9. Re:well and good by CrosbieSmith · · Score: 5, Insightful
    It will happen when the price difference between solid-state devices and magnetic storage gets narrower. That's not happening.

    This was also pointed on Saturday's Slashdot Story

    A mere $US5,000 would be something of a price sensation by the standards of current large capacity SSDs, whose prices aren't dropping nearly as quickly as are those of magnetic media.
  10. Umm.. by Anonymous Coward · · Score: 5, Informative

    My raid controller basically does this already..

    It's an old IBM 3H 64 bit PCI model with 32MB of ram and battery backup.. newer 4H models support more ram.. but how is this any different?

    The most used and smallest files stay in the cache.. the rest are called when needed.. and if god forbid the power fails, and the ups fails.. the card has a battery backup to write out the final changes once the drives come back online.

  11. Full paper in HTML by monk · · Score: 5, Informative

    For those who are not PS worthy.
    The paper
    Looks like a great server side file system. This is finally a step away from this whole "file" madness. All storage and IO should be memory mapped, and all execution should be in place. Anything else is just silly.

    --
    [-- Trust the Monkey --]
  12. ...goes great with 64-bit && cheap RAM by ceswiedler · · Score: 4, Insightful

    Though I don't think it's a useful general-purpose concept to have a RAM-only FS, I'm hoping that fast RAM will catch up to magnetic disks in size. A standard FS/VM will end up caching everything if the RAM is available. I seem to recall that ext3 on Linux, if given the RAM for cache, is faster than many ramfs/tmpfs implementations. Plan9 completely removes the concept of a permanent filesystem versus temporary memory. Everything is mapped in memory, and everything is saved to disk (eventually). It's a neat concept, and it happens to go very well with 64-bit pointers and cheap RAM.

    I'm hoping that hardware people will realize that we need huge amounts of fast memory...whether or not we think we need it. We're stuck in a "why would I need more RAM than the applications I run need?" kind of mindset. I think that the sudden freedom 64-bit pointers will provide to software developers will result in a paradigm shift in how memory (both permanent and temporary) is used. Though like all paradigm shifts, it's difficult to predict ahead of time exactly what the change will be like...

  13. Re:well and good by robslimo · · Score: 5, Insightful

    True, and that narrowing will have occurred by the time the cost/density ratio of SSM has improved by 2 or 3 orders of magnitude.

    A couple of reasons I see the death of the HDD to be not-to-imminent:

    (1) Those damned HDD makers keep pulling new physics out of their as^H^H hats and keep pushing the storage densities to rediculous new levels.

    (2) the solid state memory of the future ainta gonna be Flash as we know it now (with slow and limit write cycles) and it also will not be battery-backed RAM (unless we go write it all back to disk for 'permanent' storage at some point). I bet on some variation on today's Flash without its limitations, but the tech has got some ground to make before this all happens.

    My other long-term prediction has been that CRTs (vacuum tube, for pete's sake!) will be replaced with LCD or similar tech and we're getting really close.

  14. Same as "what if the hard drive crashes?" by Ars-Fartsica · · Score: 4, Interesting
    Battery failure in this case is the as the case of hard drive failure in the nonvolatile model. You either have backups on another media or you are screwed.

    Note that hard drive failures are still common and likely to be much more common than a battery failure, as it would be trivial to implement a scheme through which batter recharding would be automatic while the computer was plugged in. The battery would only be directly employed when the system was unplugged or the power was out. Even in that case it would be also trivial to implement a continuous/live backup system to a nonvolatile media like a hard disk, which by that point would be ridiculously cheap.

  15. Re:well and good by iabervon · · Score: 4, Interesting

    One important thing to realize about storage is that people's storage needs for some types of files grow over time, but storage needs for other things do not grow significantly. For example, if you separate attachments and filter spam, you can now buy an SD card which will store all of the email you will get in the next few years; when that runs out, you'll be able to buy a card which will store the rest of the email you will ever receive. There are similar effects for all of the text you'll ever write.

    Furthermore, there are a number of important directories on any system whose total size won't double in the next ten years, because they add one more file of about the same size for each program you install, and they already have ten years of stuff.

    In the cases where you do have exponential growth of storage use, the structure of the stored data is extremely simple; you have directories with huge files which are read sequentially and have a flat structure.

    I see a real opportunity for a system when you have one gig of solid-state storage for your structured data and HDDs (note that you can now add a new HDD without any trouble, because it's only data storage, not a filesystem) for the bulk data.

  16. Re:well and good by DoraLives · · Score: 5, Interesting
    I see a real opportunity for a system when you have one gig of solid-state storage for your structured data

    It will be OS-on-a-chip (and a good OS at that), it will go for about twenty bucks a pop down at WalMart or CompUSA and Bill Gates will die of an apoplectic fit when it hits the streets. Hackers will figure out ways to diddle it, but corporations and average users will upgrade by merely dropping another sawbuck on the counter and plugging the damned thing in when they get back to their machine(s). Computers will come with these things preinstalled, so there'll be no bitching about not having an OS with any given machine. High-end weirdness will, as ever, continue to drive a niche market, but everybody else will regard it about the same as they regard their pair of pliers; just another tool. Ho hum.

    --
    Is it fascism yet?
  17. Re:well and good by Daniel+Phillips · · Score: 4, Insightful

    I've predicted and eagerly anticipated the demise (by replacement) of spinning media (magnetic and optical) for 10 or more years now... I've predicted it will happen, not when.

    You may have to keep predicting for some time yet. So far, nobody has managed to come up with a solid-state approach that gets anywhere close to the cost of spinning media, and though solid state gets cheaper over time, spinning media does too.

    For the most part, posters to this thread missed the point of this effort. The authors observed that some relatively small portion of filesystem data - the metadata - accounts for a disproportionate amount of the IO traffic. So put just that part in battery-backed ram, and get better performance. Hopefully, the increased performance will outweigh the cost of the extra RAM.

    The fly in the ointment is that, in the case where there's a small amount of metadata compared to file data, the cost of transferring the metadata isn't that much. But when there's a lot of metadata, it won't all fit in NVRAM. Oops, it's not as big a gain as you'd first think.

    It's surprising how well Ext2 does compared to RAMFS and ConquestFS in the author's benchmarks.

    --
    Have you got your LWN subscription yet?
  18. I'm the one who gave the talk by one-egg · · Score: 5, Informative
    Well, since I'm the person who gave the talk referenced in the original post, I suppose I ought to clear up a few misconceptions for folks. I'm not going to address every objection that's been raised, because most of them have been well addressed in our papers. I'll just highlight the most common misunderstandings.

    First, the full title of the talk was "The Disk is Dead! Long Live the Disk!" We make no claim that disk manufacturers are going to go out of business tomorrow; history suggests that the technology will survive for at least a decade, and probably more than two. Talk titles are intended to generate attendance, not to summarize important research results in 8 words.

    Second, the most common objection to the work boils down to "just use the cache". This point has been raised repeatedly on Slashdot over the past few years. However, if you read our papers or attend one of my colloquium talks (UCSC, May 22nd -- plug), you'll learn that LRU caching is inferior for a number of reasons. We were surprised by that result, but it's true. Putting a fake disk behind an IDE or SCSI interface is even worse, since that cripples bandwidth and flexibility.

    Third, for people worried about battery failures, the only question of interest is the MTBF of the system as a whole. All systems fail, which is why we keep backups and double-check them. If your disk failed every 3 days, you couldn't get work done, but there was a time when we dealt with a failure every few months. Conquest's MTBF hasn't yet been analyzed rigorously, but I believe it to be more than 10,000 hours, which is good enough to make it usable.

    Finally, I have chosen not to put my talk slides on the Web, at least not for the moment. But you're welcome to mail me with questions: geoff@cs.hmc.edu. It might take me a few days to answer, so be patient.