Slashdot Mirror


A Semi-Radical Approach To Avoiding fsck

Dru writes: "This is an article about a hardware technology that is largely unknown in the new Unix community. In theory, with this inexpensive hardware, your BSD or Linux box could start doing guranteed reboots in under 2 minutes (no fsck required) and super fast database writes. It could leapfrog all of the journaling filesystem projects as well. Yes, I wrote the article. The article is long, detailed, and mentions FreeBSD often. However, I do believe it is relevant to any other PC Unix. If enough people learn about it, maybe they will start demanding it from their favorite hardware vendor." With RAM and hard drive space both continuing to decline, I wonder how the speed / use curve for individual PCs' storage (from L1 cache to backups) will evolve. With a similar bent, Arek urges you to "take a look at our company's Solid State Disk Drives." How'dja like 8 or so gigs of DRAM next time you edit a video or burn a CD?

12 of 116 comments (clear)

  1. don't bet on it. by Wakko+Warner · · Score: 3
    UPS batteries only last a few years. What happens when yours fails a few months or (if you get a defective one) a few years before its expected lifetime is up? Never, ever count on any of your computer equipment, and always have as much protection as you can afford. This is one means toward that end.

    - A.P.

    --
    * CmdrTaco is an idiot.

    --
    "Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
  2. Too much work by ericfitz · · Score: 4

    A write-caching disk controller combined with a journaling file system would give you the same benefit. You're just reinventing the wheel..

    The only really new thing here seems to be the fact that the "TRAM" is file-system aware, which is just another way of saying that you are investing in hardware which will just tie you to tired old EFS.

    Windows NT has had a journaled file system forever, and the journaling doesn't cause the major performance impact that everyone seems to think it does. Maybe someday Linus will get in the mood and allow a journaling FS into Linux.

    On a side note, what does the OS do in case of some sort of TRAM failure?

  3. RAM == volatile by pjrc · · Score: 5
    RAM is volatile.

    Sure, a battery backup sounds like it solves this, but consider that DRAM stores its charge on tiny capacitors, and requires a controller to be performing "refresh" access cycles regularily (usually every 15.26 s). This means that not only must the battery be good, but the controller accessing the DRAM must continue providing the refresh cycles without interruption. That may sound simple, but not all DRAMs are created alike.... SDRAM DIMMs have a feature called Serial Presence Detect (SPD) that is a small non-volatile 2-wire serial EEPROM memory that hold identifying data about the size and timing parameters for the memory. A typical DRAM controller would be initialized at boot time... a card like this would require a special DRAM controller that only initialized its timings when the DRAM/battery is first installed. Perhaps the controller would be designed to use relatively slow and conservative timings, always, so it'd never be able to reinitialize to other settings (that could be wrong) and/or stop providing the critical refresh at any point.

    The point is that to retain memory, DRAM requires not only power but a properly operating controller to supply the refresh cycles. Magnetic media maintains its memory without either of these conditions. Compared to magnetic media, DRAM is very volatile. "Mission Critical" data, whatever that may be, would be existing at tiny charges on the very tiny capacitors, which could dissipate in only about 4-8 ms, if the DRAM controller doesn't perform perfectly.... inside a computer (designed as a reliable server) which has just crashed for some unknown reason!

    1. Re:RAM == volatile by dbarclay10 · · Score: 5

      You raise an implementation issue.

      The point is that to retain memory, DRAM requires not only power but a properly operating controller to supply the refresh cycles.

      Laptops run off batteries, and their memory seems okay.

      My Pilot runs okay off AAA-type batteries. Memory has been running quite well, thank you :)

      This fellow wasn't talking about slapping some RAM sticks to a breadboard and running current through the wires. Of course you would need a memory controller. Duh. :) The problems you raised were solved many years ago. If they hadn't been, nobody would be using volatile memory(like SDRAM) at all - it'd be too unreliable.

      I almost think you're just looking to spread some FUD.

      "Mission Critical" data, whatever that may be, would be existing at tiny charges on the very tiny capacitors,

      You say this like it's a bad thing! It's relied on every day. Hell, mission-critical data on its way to be written to disk is nothing but A CLUMP ELECTRONS MOVING ALONG A WIRE, in a lot of cases, those wires many times smaller than a human hair.

      Don't make a mountain out of a molehill. This isn't a bad idea, and just because they'll have to put a DRAM controller on the card doesn't make it one.

      Dave

      Barclay family motto:
      Aut agere aut mori.
      (Either action or death.)

      --

      Barclay family motto:
      Aut agere aut mori.
      (Either action or death.)
    2. Re:RAM == volatile by pjrc · · Score: 3
      You say this like it's a bad thing! It's relied on every day.

      DRAM (tiny charge requiring sustained refresh operations) is relied upon during normal operation of the computer. The proposal here is to also rely upon DRAM during and after the events that lead to a crash.

      To respond specifically to your examples above, your battery supported laptop and palm pilot memory is reliable, but what happens if they crash? Is your laptop memory intact after something goes wrong? The microcontroller in a palm has no MMU, so if something does wrong, it'll be able to easily trash the memory.

      Regarding data being sent as "a clump of electrons moving along a wire" (propagation of a change in voltage potential would be more accurate)... that just simply isn't the way it's done. Communication takes place using protocols which verify that the data has been properly received. Newer ATA transfer modes use a CRC, and even with the older modes, status bits are provided to verify that the data was properly written. It would be horribly unreliable to send a "clump of electrons" and hope that the data is received and stored properly.

      Now, regarding the comment:

      I almost think you're just looking to spread some FUD.

      FUD, Fear, Uncertainty, and Doubt is a marketing tactic, generally used by an established vendor when their well-known product is inferior and more expensive, and the best way to convince a customer to buy the established product is to scare the customer away from the competitor.

      Why would I do that? I don't have any vested interest in the current practices. I'm not participating in the development of any journaled filesystems. I do a bit of freelance hardware design and small quantity sales, so if I thought this was a really good idea, I might go after making such a card and kernel patch.

      But I believe the idea is fundamentally flawed.

      During the unpredictable events that will lead to a crash, and the unpredictable behavior immediately following a crash, DRAM is going to be a much less reliable way to be holding data. It doesn't matter how well DRAM works during normal operation. DRAM has proven quite reliable, as long as the computer and memory controller operate properly.

      Even with a specially designed memory controller (as a standard one won't do), it is quite risky to rely upon DRAM during a crash. Call it FUD if you like, but DRAM just isn't a reliable place to have data when a machine crashes. You say FUD now, but if anyone were to actually make such a card, the term I'd use would be Snake Oil.

  4. Simply a cache for a journaling filesystem by elprez · · Score: 3

    First, it is absolutely critical that the OS creates some log or structure of operations on the TRAM for filesystem operations. Basically, if the OS can mark the beginning and end of an operation and place it in this memory, you can now get a journaled meta data filesystem without a complete re-architecture of a filesystem.

    Basically, if the OS can determine the beg/end of an operation (transaction) and it logs this information, then we have a journaling file system. Any persistent storage will suffice for the journal - 'TRAM' or hard disk or clay bricks. The only difference is the access time.

    In general there is no magical way for the OS to know what data is the beg/end of a transaction. The OS could try to handle meta-data in this fashion. It can log the meta-data changes it would make in atomic transactions and replay un-commited transactions on a reboot. However, the file system still needs to be aware of this journaling.

    Consider a power failure during a commit to the file system. The file system is in a partially modified state and the transaction has not been retired from the TRAM journal (since it did not complete). When the system boots again, the TRAM journal is replayed and the same operation begins again, except this time on an inconsistent file system. The file system needs to recognize that a partially commited transaction needs to be rolled back.

    The above is based on my (very incomplete) understanding of journaling file systems. However, a TRAM card amounts to a cache for a file system journal, so in no sense is it going to replace or leap-frog journaling file systems.

  5. Platypus Prices from CDW by l1nuxn3rd · · Score: 4

    Here is a link to the Solid State Hard Drive Pricing Page from CDW.
    http://www.cdw.com/shop/search/results.asp?grp=HSO
    Platypus products are listed as well as some from Quantum and Sandisk.
    You are talking $1,969.40 US currency for the Platypus QikDRIVE8 512MB, the smallest model i saw.
    CDW is the Authorized reseller I found for the US.

  6. Already available for most RAID controllers.... by X · · Score: 4

    Most RAID controllers will give you a battery backed-up write-back RAM cache. Depending on how you configure it, it will say that a write is committed as soon as it's in RAM. This accomplishes the same net effect without requiring all this modification of the OS.

    Of course, lots of people don't like to configure their RAID controllers this way, because there is no redundancy for data in RAM, not to mention that the risk of failure is still higher than with a hard disk.

    I hate to say it, but that article seems like it was written by someone who has not been out in the real world.

    --
    sigs are a waste of space
  7. Come to the point! by zmooc · · Score: 4
    To the author:

    Why on earth do you want to tell us things like this Unix was designed to be simple. This means, if they found that they could do certain things as libraries in user space, then it didn't belong in the kernel.? It has absolutely nothing to do with TRAM. Actually that's true for nearly everything you say in your article; you use a lot of irrelevant examples and try to mention everything you seem to know about Unix and then explain the solution in 2 lines?! Why don't you mention the real interesting things like that such cards most probably fail just as often as UPS'es, why this should be on a PCI-card and not on the disk (ok that's because you want to access the memory directly, but please explain this...) or what the consequences are concerning access-time?

    Although the idea is good, I think you could have done a much better article; come to the point!

    --
    0x or or snor perron?!
  8. Whats so radical... by A+Masquerade · · Score: 5

    This has been talked of for quite a time, and is hardly radical. Whats more it is not an alternative to journal based filesystems, but logically its an adjunct to them.

    First you have your filesystem that buffers transactions in a journal that is streamed to disk. Then, for performance, by avoiding all those extra seeks, you put the filesystem journal on another device - say a small fast dedicated disk. Then you make that device a NVRAM device rather than something based on spinning rust.

    Whats more, if you are interested in something like mail systems, where you get a lot of transactions that *must* committed to stable storage (although a lot of MTAs don't do that in spite of the wording of RFC821), and you use a fileystem like ext3 with a data journalling mode, then putting the journal onto NVRAM makes a huge difference - by the time it comes to the point where data would be committed to the disk from the journal, most of the data (ie e-mail messages) is now unwanted (since the messages have been delivered to final local destination or for onward transmission) and so you don't even need to do the disk ops...

    All of this is pretty much available now in ext3 other than the tools to get the journal onto a NVRAM disk - and thats just detail.

    So, nice idea, needs more flesh, a little more infrastructure needed round it.

    [Those who came to the London UKUUG Linux Conference might well have heard these discussions before going on in various corridors :-) ]

  9. Re:Interesting idea but.. by jmp100 · · Score: 4
    You'd have to implement a completely separate bus or you'd risk getting severely bogged down. You'd have to make a dedicated bus that went from the CPU to a dedicated slot and then to the hard-drive controller. Doing this on a PCI bus like that which exists today would not be particularly efficient. Certain IDE and SCSI drives talk at 100MHz and up; having disk I/O passing over the PCI bus TWICE (CPU -> TRAM -> HD) isn't the way to go, since your Ethernet, video, and other PCI devices are also competing for time on the bus.

    Of course, implementing a separate bus will take millions in research (after all, it has to be done right), but once everything is decided on, it's probably only $20 or so in extra hardware. In theory, all you'd need is another PCI bridge chip or similar. Ever seen the inside of a NetApp? The motherboard has a CPU, space for RAM, a PCI bridge, and some slots. Nothing else. Extremely simple.

  10. Exactly what NetApp does by ansible · · Score: 3

    This is what the Network Appliance boxen do to speed NFS writes.

    All NFS write transactions are commited to NVRAM first, so that they can be acknowledged. Then the writes to disk are sorted and blasted out. Very efficient, very fast.

    It is this NVRAM (as well as using a modified RAID-4 on top of the WAFL filesystem) that makes a NetApp much faster (yet still safer) than most other NFS servers. I've often thought about creating just such an NVRAM board for a PC, so that I could do the same thing with my Linux fileservers.

    Note that the NetApp implementation caches NFS requests, not filesystem-level data. Say I'm changing 1 byte in a block. If I buffer filesystem data, I have to cache the whole block. If I'm buffering the NFS request, it'll be much smaller.

    Buffering (in NVRAM) the log data might work well for something like ext3.